When More Cores Hurts: The Vector Database Scaling Paradox in HPC

Amal Gueroudji; Ian Foster; Kyle Chard; Matthieu Dorier; Nicholas Chia; Philip Carns; Robert Latham; Robert Ross; Robert Underwood; Rochana Chaturvedi

arxiv: 2606.08950 · v1 · pith:YCIUZXW2new · submitted 2026-06-08 · 💻 cs.DC · cs.DB

When More Cores Hurts: The Vector Database Scaling Paradox in HPC

Seth Ockerman , Song Young Oh , Amal Gueroudji , Rochana Chaturvedi , Philip Carns , Nicholas Chia , Matthieu Dorier , Robert Latham

show 7 more authors

Tanwi Mallick Swan Perarnau Robert Underwood Kyle Chard Ian Foster Robert Ross Shivaram Venkataraman

This is my paper

Pith reviewed 2026-06-27 15:17 UTC · model grok-4.3

classification 💻 cs.DC cs.DB

keywords vector databasesHPC scalingquery throughputscientific AIsupercomputersperformance paradoxdistributed workers

0 comments

The pith

Scaling vector databases to more cores on HPC systems can reduce query throughput by up to 30.67%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests three vector databases on real supercomputers to check their behavior under scientific search tasks such as molecule matching and trajectory analysis. It shows that adding workers sometimes slows queries and that a 16-fold increase in workers produces only a 5.46-fold gain. The cause is a design mismatch: the databases assume cloud-style loose coupling, while HPC systems use fast but different interconnects and node layouts. If the pattern holds, current tools cannot deliver the speed needed for large scientific AI workloads without major changes.

Core claim

When Qdrant, Milvus, and Weaviate are run on production supercomputers with mixed read/write and write-then-read workloads, workload traits limit latency gains, extra cores cut query throughput by as much as 30.67 percent, and moving from 16 to 256 workers yields only a 5.46 times improvement. This scaling paradox reveals the basic mismatch between cloud-oriented vector database designs and HPC hardware.

What carries the argument

The scaling paradox, measured as sublinear or negative throughput change when increasing distributed workers on HPC nodes.

If this is right

Cloud vector database designs require HPC-specific adaptations to avoid throughput loss with added cores.
Workload patterns such as mixed read/write operations constrain how much latency improves at scale.
New vector database architectures must account for HPC interconnects and node memory hierarchies.
Scientific applications relying on vector search will face performance ceilings until the mismatch is addressed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same core-count penalty may appear when other cloud data services are moved to tightly coupled HPC networks.
Targeted changes to sharding or indexing that reduce cross-node traffic could be tested directly on the reported hardware.

Load-bearing premise

The chosen workload patterns, benchmarks, multimodal embeddings, and scientific dataset represent the demands of emerging scientific AI workloads on HPC systems.

What would settle it

A follow-up run on the same hardware and workloads that achieves near-linear throughput gains up to 256 workers would falsify the paradox.

Figures

Figures reproduced from arXiv: 2606.08950 by Amal Gueroudji, Ian Foster, Kyle Chard, Matthieu Dorier, Nicholas Chia, Philip Carns, Robert Latham, Robert Ross, Robert Underwood, Rochana Chaturvedi, Seth Ockerman, Shivaram Venkataraman, Song Young Oh, Swan Perarnau, Tanwi Mallick.

**Figure 1.** Figure 1: Life cycle of a vector database. V. EVALUATING THE VDB LIFE CYCLE ON HPC SYSTEMS In the absence of VDB-focused studies on HPC systems, the community lacks a baseline for expected state-of-the-art performance and an understanding of how existing designs’ strengths and weaknesses manifest. To address this gap, we evaluate two representative workload patterns that span the spectrum of VDB usage in both indust… view at source ↗

**Figure 2.** Figure 2: Pes2o-VE: Testing the impact of different storage [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: Pes2o-VE: testing insertion scaling on Aurora. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Pes2o-VE-10M and Yandex-T2I=10M index core test [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Query performance with varied cores for Yandex-T2I-10M and Pes2o-VE-10M on Aurora. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Distributed query testing with the full Pes2o-VE dataset on Aurora. Note that because of runtime failures while using [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

Vector databases have been designed and optimized for cloud environments; however, emerging scientific AI workloads (e.g., molecular search, meteorological trajectory detection, and literature-driven hypothesis generation) demand efficient, scalable execution on HPC systems. We present a large-scale evaluation of three state-of-the-art vector databases -- Qdrant, Milvus, and Weaviate -- on two production supercomputers, scaling to 256 distributed workers across 64 compute nodes. We evaluate representative workload patterns -- mixed read/write and write-then-read -- using popular benchmarks, multimodal embeddings, and a novel real-world scientific dataset. Our results reveal that workload characteristics can limit latency reduction, additional cores can reduce query throughput by up to 30.67%, and scaling from 16 to 256 workers (16x) only yields a 5.46x improvement. This scaling paradox exposes the fundamental mismatch between cloud-oriented designs and HPC systems, highlighting the need for new, HPC-aware vector database designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper measures negative scaling with more cores in vector DBs on real supercomputers, but the workload representativeness is the part that needs checking before the mismatch claim lands.

read the letter

The main thing here is that the authors ran Qdrant, Milvus, and Weaviate at up to 256 workers on two production supercomputers and recorded that extra cores cut throughput by as much as 30.67 percent while 16x more workers only produced 5.46x better performance under their mixed read/write and write-then-read patterns.

They did the work of moving these tests off cloud testbeds and onto actual HPC hardware with both standard benchmarks and a new scientific dataset. That scale and setting is new for this topic and gives concrete numbers that cloud-focused papers usually lack.

The soft spot is exactly the one the stress-test note flags. The headline claim about a fundamental cloud-HPC mismatch rests on whether the chosen workload patterns, multimodal embeddings, and dataset actually match what molecular search or trajectory detection look like in practice. If the access patterns or data distributions differ, the negative scaling could be an artifact of the test suite rather than evidence of a design flaw that applies more broadly. The abstract also gives no error bars or hardware configuration details, so the 30 percent figure is hard to assess for variability without the full methods.

This is a pure measurement study with no fitted parameters or circular derivations, which keeps the circularity burden low. The experimental design itself looks straightforward.

People who run embedding workloads on national supercomputers would find the scaling data useful to see. It is worth sending to peer review so the community can examine the workload choices and experimental controls directly.

Referee Report

3 major / 2 minor

Summary. The paper evaluates three vector databases (Qdrant, Milvus, Weaviate) on two production supercomputers, scaling to 256 distributed workers across 64 nodes. Using mixed read/write and write-then-read workload patterns with popular benchmarks, multimodal embeddings, and a novel real-world scientific dataset, it reports that additional cores can reduce query throughput by up to 30.67% and that scaling from 16 to 256 workers yields only a 5.46x improvement, concluding that this scaling paradox exposes a fundamental mismatch between cloud-oriented vector database designs and HPC systems for emerging scientific AI workloads.

Significance. If the reported scaling behaviors hold under representative conditions, the work is significant for identifying concrete performance limitations when deploying vector databases on HPC platforms for scientific applications such as molecular search and trajectory detection. The large-scale empirical measurements on production systems provide practical evidence that could motivate development of HPC-aware vector database architectures.

major comments (3)

[Abstract and §4 (Results)] Abstract and §4 (Results): The headline quantitative claims (throughput reduction of up to 30.67% and 5.46x speedup from 16x workers) are stated without error bars, number of experimental repetitions, statistical tests, or variance measures. This directly affects verifiability of the central empirical claim that additional cores hurt performance.
[§3 (Workloads and Dataset)] §3 (Workloads and Dataset): The conclusion of a fundamental cloud-HPC mismatch rests on the assertion that the chosen mixed read/write and write-then-read patterns, benchmarks, multimodal embeddings, and novel scientific dataset are representative of emerging scientific AI workloads. No quantitative validation, access-pattern comparison, or justification against real tasks (e.g., molecular search or hypothesis generation) is provided, leaving the representativeness assumption untested.
[§2 (Experimental Setup)] §2 (Experimental Setup): Hardware configuration details, node specifications, network topology, and exclusion criteria for the two supercomputers are not supplied, preventing independent assessment or reproduction of the scaling measurements.

minor comments (2)

[Abstract] Abstract: The two production supercomputers are not named; early identification would improve clarity.
Figure captions and tables: Axis labels and legends should explicitly state whether throughput is normalized or absolute to avoid ambiguity in interpreting the negative scaling results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which help improve the clarity and reproducibility of our work. We address each major point below.

read point-by-point responses

Referee: [Abstract and §4 (Results)] The headline quantitative claims (throughput reduction of up to 30.67% and 5.46x speedup from 16x workers) are stated without error bars, number of experimental repetitions, statistical tests, or variance measures. This directly affects verifiability of the central empirical claim that additional cores hurt performance.

Authors: We agree that including statistical measures would enhance verifiability. Although the experiments were repeated to ensure reliability, variance was not reported. In the revised manuscript, we will include error bars representing standard deviation, specify the number of repetitions (typically 5 per configuration), and note any statistical considerations in §4. revision: yes
Referee: [§3 (Workloads and Dataset)] The conclusion of a fundamental cloud-HPC mismatch rests on the assertion that the chosen mixed read/write and write-then-read patterns, benchmarks, multimodal embeddings, and novel scientific dataset are representative of emerging scientific AI workloads. No quantitative validation, access-pattern comparison, or justification against real tasks (e.g., molecular search or hypothesis generation) is provided, leaving the representativeness assumption untested.

Authors: The workload patterns were derived from standard vector database benchmarks and extended with a novel dataset from scientific applications. We will revise §3 to provide additional justification, including references to similar access patterns in molecular search literature, while acknowledging that a full quantitative validation against all possible scientific tasks is beyond the scope of this study. revision: partial
Referee: [§2 (Experimental Setup)] Hardware configuration details, node specifications, network topology, and exclusion criteria for the two supercomputers are not supplied, preventing independent assessment or reproduction of the scaling measurements.

Authors: We will expand §2 with comprehensive details on the hardware configurations of both supercomputers, including node specifications (CPU, memory, accelerators if any), network topology (e.g., InfiniBand), and any node selection criteria used during experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurement study

full rationale

The paper reports direct measurements of query throughput, latency, and scaling on production supercomputers using chosen benchmarks and workloads. No equations, derivations, fitted parameters, or predictions are present that could reduce results to inputs by construction. Claims rest on observed data from the evaluation rather than any self-referential modeling or self-citation chains. This is the expected outcome for an empirical scaling study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical benchmarking study; the central claim rests on the assumption that the selected workloads and datasets represent scientific AI use cases on HPC. No free parameters or invented entities are introduced.

axioms (1)

domain assumption The evaluated workload patterns and datasets accurately reflect demands of scientific AI workloads on HPC systems.
Invoked to generalize the observed scaling paradox to a fundamental mismatch between cloud designs and HPC.

pith-pipeline@v0.9.1-grok · 5751 in / 1233 out tokens · 19224 ms · 2026-06-27T15:17:24.577763+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

115 extracted references · 81 canonical work pages · 17 internal anchors

[1]

Efficient Estimation of Word Representations in Vector Space

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013. [Online]. Available: https://doi.org/10.48550/arXiv.1301.3781

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1301.3781 2013
[2]

GloVe: Global vectors for word representation,

J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word representation,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds. Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. [Online]. Available: https://doi.org/10....

work page doi:10.3115/v1/d14-1162 2014
[3]

Generative AI of pinecone vector retrieval and retrieval-augmented generation architecture: Financial data-driven intelligent customer recommendation system,

Q. Hu, X. Li, Z. Li, and Y . Zhang, “Generative AI of pinecone vector retrieval and retrieval-augmented generation architecture: Financial data-driven intelligent customer recommendation system,” inProceedings of the 2025 2nd International Conference on Digital Economy and Computer Science, ser. DECS ’25. New York, NY , USA: Association for Computing Mach...

work page doi:10.1145/3785706.3785900 2025
[4]

A curriculum recommendation system using a vector database for challenge-based learning,

O. B. Akhiiezer, O. A. Haluza, L. M. Lyubchyk, and V . Y . Sokol, “A curriculum recommendation system using a vector database for challenge-based learning,” inJournal of Physics Conference Series, ser. Journal of Physics Conference Series, vol. 3105. IOP, Sep. 2025, p. 012023. [Online]. Available: https://doi.org/10.1088/1742-6596/3105/1/012023

work page doi:10.1088/1742-6596/3105/1/012023 2025
[5]

SMURF: federated multimodal retrieval for scientific data via embedding alignment,

S. Y . Oh, A. Khan, I. Foster, and K. Chard, “SMURF: federated multimodal retrieval for scientific data via embedding alignment,” in 2025 IEEE International Conference on eScience (eScience), 2025, pp. 452–459, https://doi.org/10.1109/eScience65000.2025.00091

work page doi:10.1109/escience65000.2025.00091 2025
[6]

HybridRAG: integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction,

B. Sarmah, B. Hall, R. Rao, S. Patel, S. Pasquali, and D. Mehta, “HybridRAG: integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction,” 2024. [Online]. Available: https://doi.org/10.1145/3677052.3698671

work page doi:10.1145/3677052.3698671 2024
[7]

A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models,

W. Fan, Y . Ding, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and Q. Li, “A survey on RAG meeting LLMs: Towards retrieval- augmented large language models,” 2024. [Online]. Available: https://doi.org/10.1145/3637528.3671470

work page doi:10.1145/3637528.3671470 2024
[8]

MemGPT: Towards LLMs as operating systems,

C. Packer, S. Wooders, K. Lin, V . Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “MemGPT: Towards LLMs as operating systems,”
[9]

MemGPT: Towards LLMs as Operating Systems

[Online]. Available: https://doi.org/10.48550/arXiv.2310.08560

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.08560
[10]

MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

D. Jiang, Y . Li, G. Li, and B. Li, “MAGMA: a multi-graph based agentic memory architecture for AI agents,” 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2601.03236

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.03236 2026
[11]

MemoryBank: Enhancing Large Language Models with Long-Term Memory

W. Zhong, L. Guo, Q. Gao, H. Ye, and Y . Wang, “MemoryBank: Enhancing large language models with long-term memory,” 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2305.10250 11ChatGPT [104] was used to improve the grammar and phrasing of this work

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.10250 2023
[12]

A-MEM: Agentic Memory for LLM Agents

W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y . Zhang, “A-mem: Agentic memory for LLM agents,” inAdvances in Neural Information Processing Systems, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2502.12110

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.12110 2025
[13]

Multi-modal foundation model for cosmological simulation data,

B. Xia, N. Ramachandra, A. I. Wells, S. Habib, and J. Wise, “Multi-modal foundation model for cosmological simulation data,”
[14]

Available: https://doi.org/10.48550/arXiv.2510.07684

[Online]. Available: https://doi.org/10.48550/arXiv.2510.07684

work page doi:10.48550/arxiv.2510.07684
[15]

TRAKNN: efficient trajectory aware spatiotemporal kNN for rare meteorological trajectory detection,

G. Coulaud and D. Faranda, “TRAKNN: efficient trajectory aware spatiotemporal kNN for rare meteorological trajectory detection,”
[16]

Available: https://doi.org/10.48550/arXiv.2603.02059

[Online]. Available: https://doi.org/10.48550/arXiv.2603.02059

work page doi:10.48550/arxiv.2603.02059
[17]

Dual retrieving and ranking medical large language model with retrieval augmented generation,

Q. Yang, H. Zuo, R. Su, H. Su, T. Zeng, H. Zhou, R. Wang, J. Chen, Y . Lin, Z. Chen, and T. Tan, “Dual retrieving and ranking medical large language model with retrieval augmented generation,”Scientific Reports, vol. 15, 05 2025. [Online]. Available: https://doi.org/10.1038/s41598-025-00724-w

work page doi:10.1038/s41598-025-00724-w 2025
[18]

Exploring distributed vector databases performance on HPC platforms: A study with Qdrant,

S. Ockerman, A. Gueroudji, S. Y . Oh, R. Underwood, N. Chia, K. Chard, R. Ross, and S. Venkataraman, “Exploring distributed vector databases performance on HPC platforms: A study with Qdrant,”
[19]

Available: https://doi.org/10.1145/3731599.3767404

[Online]. Available: https://doi.org/10.1145/3731599.3767404

work page doi:10.1145/3731599.3767404
[20]

A foundational multimodal vision language ai assistant for human pathology,

M. Y . Lu, B. Chen, D. F. K. Williamson, R. J. Chen, K. Ikamura, G. Gerber, I. Liang, L. P. Le, T. Ding, A. V . Parwani, and F. Mahmood, “A foundational multimodal vision language ai assistant for human pathology,” 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2312.07814

work page doi:10.48550/arxiv.2312.07814 2023
[21]

Automating literature screening and curation with applications to computational neuroscience,

Z. Ji, S. Guo, Y . Qiao, and R. A. McDougal, “Automating literature screening and curation with applications to computational neuroscience,”Journal of the American Medical Informatics Association, vol. 31, no. 7, pp. 1463–1470, 05 2024. [Online]. Available: https://doi.org/10.1093/jamia/ocae097

work page doi:10.1093/jamia/ocae097 2024
[22]

and Senior, Catherine A

V . Eyring, S. Bony, G. A. Meehl, C. A. Senior, B. Stevens, R. J. Stouf- fer, and K. E. Taylor, “Overview of the coupled model intercompar- ison project phase 6 (CMIP6) experimental design and organization,” Geoscientific Model Development, vol. 9, no. 5, pp. 1937–1958, 2016, https://doi.org/10.5194/gmd-9-1937-2016

work page doi:10.5194/gmd-9-1937-2016 1937
[23]

The International Genome Sample Resource (IGSR) collection of open human genomic variation resources,

S. Fairley, E. Lowy-Gallego, E. Perry, and P. Flicek, “The International Genome Sample Resource (IGSR) collection of open human genomic variation resources,”Nucleic Acids Research, vol. 48, no. D1, pp. D941–D947, 10 2019. [Online]. Available: https://doi.org/10.1093/nar/gkz836

work page doi:10.1093/nar/gkz836 2019
[24]

What to support when you’re compressing: The state of practice gaps and opportunities for scientific data compression,

F. Cappello, R. Underwood, Y . Alexeev, A. Baker, E. Bozda ˘g, M. Burtscher, K. Chard, S. Di, K. G. Felker, P. C. O’Grady, H. Guo, Y . Huang, P. Jiang, S. Jin, P. Johansson, S. Li, X. Liang, E. Lindahl, P. Lindstrom, Z. Luki ´c, M. Lundborg, D. Lykov, M. Nagaso, K. Sato, A. Singh, S. W. Son, S. Song, W. Tang, D. Tao, J. Tian, K. Yoshii, and K. Zhao, “What...

work page doi:10.1145/3712285.3759856 2025
[25]

Learned protein embeddings for machine learning,

K. K. Yang, Z. Wu, C. N. Bedbrook, and F. H. Arnold, “Learned protein embeddings for machine learning,”Bioinformatics, vol. 34, no. 15, pp. 2642–2648, 03 2018. [Online]. Available: https://doi.org/10.1093/bioinformatics/bty178

work page doi:10.1093/bioinformatics/bty178 2018
[26]

Scalable embedding fusion with protein language models: insights from benchmarking text-integrated representations,

Y . S. Ko, J. Parkinson, and W. Wang, “Scalable embedding fusion with protein language models: insights from benchmarking text-integrated representations,”Briefings in Bioinformatics, vol. 27, no. 1, p. bbag014, 01 2026. [Online]. Available: https://doi.org/10.1093/bib/bbag014

work page doi:10.1093/bib/bbag014 2026
[27]

When protein structure embedding meets large language models,

S. Ali, P. Chourasia, and M. Patterson, “When protein structure embedding meets large language models,”Genes, vol. 15, no. 1, 2024. [Online]. Available: https://doi.org/10.3390/genes15010025

work page doi:10.3390/genes15010025 2024
[28]

Milvus: A purpose-built vector data management system,

J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xu, K. Yu, Y . Yuan, Y . Zou, J. Long, Y . Cai, Z. Li, Z. Zhang, Y . Mo, J. Gu, R. Jiang, Y . Wei, and C. Xie, “Milvus: A purpose-built vector data management system,” inProceedings of the 2021 International Conference on Management of Data, ser. SIGMOD ’21. New York, NY , USA: Assoc...

work page doi:10.1145/3448016.3457550 2021
[29]

[Online]

Vespa Team, “Vespa,” 2026. [Online]. Available: https://vespa.ai/

2026
[30]

[Online]

Qdrant Team, “Qdrant,” 2026. [Online]. Available: https://qdrant.tech/

2026
[31]

[Online]

Vald Team, “Vald,” 2026. [Online]. Available: https://vald.vdaas.org/

2026
[32]

Weaviate,

Weaviate Team, “Weaviate,” 2026. [Online]. Available: https: //weaviate.io/

2026
[33]

Billion-scale similarity search with GPUs

J. Johnson, M. Douze, and H. J ´egou, “Billion-scale similarity search with GPUs,” 2017, https://doi.org/10.48550/arXiv.1702.08734

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1702.08734 2017
[34]

Cloud-native vector search: A comprehensive performance analysis,

Z. Li, W. Ding, S. Huang, Z. Wang, Y . Lin, K. Wu, Y . Park, and J. Chen, “Cloud-native vector search: A comprehensive performance analysis,”
[35]

Available: https://doi.org/10.48550/arXiv.2511.14748

[Online]. Available: https://doi.org/10.48550/arXiv.2511.14748

work page doi:10.48550/arxiv.2511.14748
[36]

Evaluating the cloud for capability class leadership workloads,

J. R. Lange and et al., “Evaluating the cloud for capability class leadership workloads,” Oak Ridge National Laboratory, Technical Report ORNL/TM-2023/3083, September 2023. [Online]. Available: https://doi.org/10.2172/2000306

work page doi:10.2172/2000306 2023
[37]

Usability evaluation of cloud for HPC applications,

V . Sochat, D. Milroy, A. Sarkar, A. Marathe, and T. Patki, “Usability evaluation of cloud for HPC applications,” 2025. [Online]. Available: https://doi.org/10.1145/3731599.3767353

work page doi:10.1145/3731599.3767353 2025
[38]

A performance comparison of hpc workloads on traditional and cloud-based HPC clusters,

V . Munhoz, A. Bonfils, M. Castro, and O. Mendizabal, “A performance comparison of hpc workloads on traditional and cloud-based HPC clusters,” in2023 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2023, pp. 108–114, https://doi.org/10.1109/SBAC-PADW60351.2023.00026

work page doi:10.1109/sbac-padw60351.2023.00026 2023
[39]

Ceems: A resource manager agnostic en- ergy and emissions monitoring stack,

R. Latham, R. B. Ross, P. Carns, S. Snyder, K. Harms, K. Velusamy, P. Coffman, and G. McPheeters, “Initial experiences with DAOS object storage on Aurora,” inProceedings of the SC ’24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, ser. SC- W ’24. IEEE Press, 2025, pp. 1304–1310. [Online]. Available...

work page doi:10.1109/scw63240.2024.00171 2025
[40]

AI and HPC applications on leadership computing platforms: Performance and scalability studies,

J. Kwack, C. Bertoni, U. Unnikrishnan, R. Balin, K. Hossain, Y . Ghadar, T. J. Williams, A. Bagusetty, M. Thavappiragasam, V . Hatanp ¨a¨a, A. Vasan, J. Tramm, and S. Parker, “AI and HPC applications on leadership computing platforms: Performance and scalability studies,” in2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 202...

work page doi:10.1109/ipdps64566.2025.00027 2025
[41]

Polaris and acceptance testing

B. Homerding, B. Lenard, C. Blackworth, C. Holohan, A. Kulyavtsev, G. McPheeters, E. Pershy, P. Rich, D. Waldron, M. Zhang, K. Harms, T. Leggett, and W. Allcock, “Polaris and acceptance testing.” CUG, 2023, https://cug.org/proceedings/cug2023 proceedings/includes/files/ pap109s2-file1.pdf

2023
[42]

DAOS: A scale-out high performance storage stack for storage class memory,

Z. Liang, J. Lombardi, M. Chaarawi, and M. Hennecke, “DAOS: A scale-out high performance storage stack for storage class memory,” inSupercomputing Frontiers: 6th Asian Conference, SCFA 2020, Singapore, February 24–27, 2020, Proceedings. Berlin, Heidelberg: Springer-Verlag, 2020, pp. 40–54. [Online]. Available: https://doi.org/10.1007/978-3-030-48842-0 3

work page doi:10.1007/978-3-030-48842-0 2020
[43]

Efficient indexing of billion-scale datasets of deep descriptors,

A. B. Yandex and V . Lempitsky, “Efficient indexing of billion-scale datasets of deep descriptors,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2055–2063. [Online]. Available: https://doi.org/10.1109/CVPR.2016.226

work page doi:10.1109/cvpr.2016.226 2016
[44]

Product quantization for nearest neighbor search,

H. J ´egou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, 2011. [Online]. Available: https://doi.org/10.1109/TPAMI.2010.57

work page doi:10.1109/tpami.2010.57 2011
[45]

dbpedia-entities- openai-1m,

Kumar Shivendu and Nirant Kasliwal, “dbpedia-entities- openai-1m,” 2023, https://huggingface.co/datasets/KShivendu/ dbpedia-entities-openai-1M

2023
[46]

MOFA: discovering materials for carbon capture with a GenAI- and simulation-based workflow,

X. Yan, N. Hudson, H. Park, D. Grzenda, J. G. Pauloski, M. Schwarting, H. Pan, H. Harb, S. Foreman, C. Knight, T. Gibbs, K. Chard, S. Chaudhuri, E. Tajkhorshid, I. Foster, M. Moosavi, L. Ward, and E. A. Huerta, “MOFA: discovering materials for carbon capture with a GenAI- and simulation-based workflow,” 2025. [Online]. Available: https://doi.org/10.48550/...

work page doi:10.48550/arxiv.2501.10651 2025
[47]

Osprey: Production-ready agentic AI for safety-critical control systems,

T. Hellert, J. Montenegro, and A. Sulc, “Osprey: Production-ready agentic AI for safety-critical control systems,” 2025. [Online]. Available: https://doi.org/10.1063/5.0306302

work page doi:10.1063/5.0306302 2025
[48]

The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

M. Aum ¨uller and M. Ceccarello, “The role of local intrinsic dimensionality in benchmarking nearest neighbor search,” 2019. [Online]. Available: https://doi.org/10.48550/arXiv.1907.07387

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1907.07387 2019
[49]

On the correlation be- tween local intrinsic dimensionality and outlierness,

M. E. Houle, E. Schubert, and A. Zimek, “On the correlation be- tween local intrinsic dimensionality and outlierness,” inSimilarity Search and Applications: 11th International Conference, SISAP 2018, Lima, Peru, October 7–9, 2018, Proceedings. Berlin, Heidelberg: Springer-Verlag, 2018, pp. 177–191, https://dl.acm.org/doi/10.1007/ 978-3-030-02224-2 14

2018
[50]

The impacts of data, ordering, and intrinsic dimensionality on recall in hierarchical navigable small worlds,

O. P. Elliott and J. Clark, “The impacts of data, ordering, and intrinsic dimensionality on recall in hierarchical navigable small worlds,” in Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval, ser. ICTIR ’24. ACM, Aug. 2024, p. 25–33. [Online]. Available: https://doi.org/10.1145/3664190.3672512

work page doi:10.1145/3664190.3672512 2024
[51]

The Lustre Storage Architecture

P. Braam, “The Lustre storage architecture,” 2019. [Online]. Available: https://doi.org/10.48550/arXiv.1903.01955

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1903.01955 2019
[52]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Y . Zhang, M. Li, D. Long, X. Zhang, H. Lin, B. Yang, P. Xie, A. Yang, D. Liu, J. Lin, F. Huang, and J. Zhou, “Qwen3 embedding: Advancing text embedding and reranking through foundation models,”arXiv preprint arXiv:2506.05176, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2506.05176

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.05176 2025
[53]

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

C. Lee, R. Roy, M. Xu, J. Raiman, M. Shoeybi, B. Catanzaro, and W. Ping, “NV-Embed: Improved techniques for training LLMs as generalist embedding models,” 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2405.17428

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2405.17428 2025
[54]

Scalable nearest neighbor algorithms for high dimensional data,

M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, pp. 2227–2240, 2014. [Online]. Available: https://doi.org/10.1109/TPAMI.2014.2321376

work page doi:10.1109/tpami.2014.2321376 2014
[55]

Similarity search in high dimensions via hashing,

A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high dimensions via hashing,” inProceedings of the 25th International Conference on Very Large Data Bases, ser. VLDB ’99. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1999, p. 518–529. [Online]. Available: https://dl.acm.org/doi/10.5555/645925.671516

work page doi:10.5555/645925.671516 1999
[56]

Data structures and algorithms for nearest neighbor search in general metric spaces,

P. N. Yianilos, “Data structures and algorithms for nearest neighbor search in general metric spaces,” inProceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA ’93. USA: Society for Industrial and Applied Mathematics, 1993, pp. 311–321. [Online]. Available: https://dl.acm.org/doi/10.5555/313559.313789

work page doi:10.5555/313559.313789 1993
[57]

Approximate nearest neighbors: towards removing the curse of dimensionality,

P. Indyk and R. Motwani, “Approximate nearest neighbors: towards removing the curse of dimensionality,” inProceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, ser. STOC ’98. New York, NY , USA: Association for Computing Machinery, 1998, pp. 604–613. [Online]. Available: https://doi.org/10.1145/276698.276876

work page doi:10.1145/276698.276876 1998
[58]

Multidimensional binary search trees used for associative searching,

J. L. Bentley, “Multidimensional binary search trees used for associative searching,”Commun. ACM, vol. 18, no. 9, pp. 509–517, Sep. 1975. [Online]. Available: https://doi.org/10.1145/361002.361007

work page doi:10.1145/361002.361007 1975
[59]

R-trees: a dynamic index structure for spatial searching,

A. Guttman, “R-trees: a dynamic index structure for spatial searching,” inProceedings of the 1984 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’84. New York, NY , USA: Association for Computing Machinery, 1984, pp. 47–57. [Online]. Available: https://doi.org/10.1145/602259.602266

work page doi:10.1145/602259.602266 1984
[60]

Efficient in-memory inverted indexes: Theory and practice,

J. Mackenzie, S. MacAvaney, A. Mallia, and M. Siedlaczek, “Efficient in-memory inverted indexes: Theory and practice,” inProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’25. New York, NY , USA: Association for Computing Machinery, 2025, pp. 4102–4105. [Online]. Available: https://...

work page doi:10.1145/3726302.3731688 2025
[61]

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,

A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,”Commun. ACM, vol. 51, no. 1, pp. 117–122, Jan. 2008. [Online]. Available: https://doi.org/10.1145/1327452.1327494

work page doi:10.1145/1327452.1327494 2008
[62]

A survey on locality sensitive hashing algorithms and their applications,

O. Jafari, P. Maurya, P. Nagarkar, K. M. Islam, and C. Crushev, “A survey on locality sensitive hashing algorithms and their applications,”
[63]

Available: https://doi.org/10.48550/arXiv.2102.08942

[Online]. Available: https://doi.org/10.48550/arXiv.2102.08942

work page doi:10.48550/arxiv.2102.08942
[64]

Diskann: Fast accurate billion-point nearest neighbor search on a single node,

S. J. Subramanya, Devvrit, R. Kadekodi, R. Krishaswamy, and H. V . Simhadri, “Diskann: Fast accurate billion-point nearest neighbor search on a single node,” inNeurIPS 2019, November 2019. [Online]. Available: https://dl.acm.org/doi/abs/10.5555/3454287.3455520

work page doi:10.5555/3454287.3455520 2019
[65]

Fast approximate nearest neighbor search with the navigating spreading-out graph,

C. Fu, C. Xiang, C. Wang, and D. Cai, “Fast approximate nearest neighbor search with the navigating spreading-out graph,” 2025. [Online]. Available: https://doi.org/10.14778/3303753.3303754

work page doi:10.14778/3303753.3303754 2025
[66]

CAGRA: highly parallel graph construction and approximate nearest neighbor search for GPUs,

H. Ootomo, A. Naruse, C. Nolet, R. Wang, T. Feher, and Y . Wang, “CAGRA: highly parallel graph construction and approximate nearest neighbor search for GPUs,” 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2308.15136

work page doi:10.48550/arxiv.2308.15136 2024
[67]

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

Y . A. Malkov and D. A. Yashunin, “Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs,” 2018, https://doi.org/10.48550/arXiv.1603.09320

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1603.09320 2018
[68]

Rapidsai/raft: RAFT contains fundamental widely-used algorithms and primitives for data science, graph and machine learning

Rapidsai, “Rapidsai/raft: RAFT contains fundamental widely-used algorithms and primitives for data science, graph and machine learning.” 2022. [Online]. Available: https://github.com/rapidsai/raft

2022
[69]

Chord: A scalable peer-to-peer lookup service for internet applications,

I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” inProceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ser. SIGCOMM ’01. New York, NY , USA: Association for Computing Machinery, 2001, pp. 1...

work page doi:10.1145/383059.383071 2001
[70]

Vector databases for modelling, managing and querying big scientific data: Models, issues, paradigms,

A. Cuzzocrea, “Vector databases for modelling, managing and querying big scientific data: Models, issues, paradigms,” inProceedings of the 37th International Conference on Scalable Scientific Data Manage- ment, ser. SSDBM ’25. New York, NY , USA: Association for Com- puting Machinery, 2025, https://doi.org/10.1145/3733723.3742469

work page doi:10.1145/3733723.3742469 2025
[71]

Vector database management systems: Fundamental concepts, use-cases, and current challenges,

T. Taipalus, “Vector database management systems: Fundamental concepts, use-cases, and current challenges,”Cognitive Systems Research, vol. 85, p. 101216, 2024. [Online]. Available: https: //doi.org/10.48550/arXiv.2309.11322

work page doi:10.48550/arxiv.2309.11322 2024
[72]

2509.05382

J. J. Pan, J. Wang, and G. Li, “Survey of vector database management systems,” 2023. [Online]. Available: https://doi.org/10.48550/arXiv. 2310.14021

work page internal anchor Pith review doi:10.48550/arxiv 2023
[73]

A comprehensive survey on vector database: Storage and retrieval technique, challenge,

Y . Han, C. Liu, and P. Wang, “A comprehensive survey on vector database: Storage and retrieval technique, challenge,”arXiv preprint arXiv:2310.11703, 2023. [Online]. Available: https://doi.org/10.48550/ arXiv.2310.11703

arXiv 2023
[74]

Towards understanding systems trade-offs in retrieval-augmented generation model inference,

M. Shen, M. Umar, K. Maeng, G. E. Suh, and U. Gupta, “Towards understanding systems trade-offs in retrieval-augmented generation model inference,” 2024. [Online]. Available: https: //doi.org/10.48550/arXiv.2412.11854

work page doi:10.48550/arxiv.2412.11854 2024
[75]

A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search,

M. Wang, X. Xu, Q. Yue, and Y . Wang, “A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search,”Proc. VLDB Endow., vol. 14, no. 11, p. 1964–1978, Jul

1964
[76]

Available: https://doi.org/10.14778/3476249.3476255

[Online]. Available: https://doi.org/10.14778/3476249.3476255

work page doi:10.14778/3476249.3476255
[77]

ANN-Benchmarks: a benchmarking tool for approximate nearest neighbor algorithms,

M. Aum ¨uller, E. Bernhardsson, and A. Faithfull, “ANN-Benchmarks: a benchmarking tool for approximate nearest neighbor algorithms,”
[78]

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms

[Online]. Available: https://doi.org/10.48550/arXiv.1807.05614

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1807.05614
[79]

On- tology generation using large language models

G. Pandit, M. Roder, and A.-C. Ngonga Ngomo, “Evaluating approximate nearest neighbour search systems on knowledge graph embeddings,” inThe Semantic Web: 22nd European Semantic Web Conference, ESWC 2025, Portoroz, Slovenia, June 1–5, 2025, Proceed- ings, Part I. Berlin, Heidelberg: Springer-Verlag, 2025, pp. 59–76. [Online]. Available: https://doi.org/10....

work page doi:10.1007/978-3-031-94575-5 2025
[80]

RAGPerf: An end-to-end benchmarking framework for retrieval-augmented generation systems,

S. Li, Y . Zhou, Y . Xu, K. Chen, D. Waddington, S. Sundararaman, H. Franke, and J. Huang, “RAGPerf: An end-to-end benchmarking framework for retrieval-augmented generation systems,” 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2603.10765

work page doi:10.48550/arxiv.2603.10765 2026

Showing first 80 references.

[1] [1]

Efficient Estimation of Word Representations in Vector Space

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013. [Online]. Available: https://doi.org/10.48550/arXiv.1301.3781

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1301.3781 2013

[2] [2]

GloVe: Global vectors for word representation,

J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for word representation,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), A. Moschitti, B. Pang, and W. Daelemans, Eds. Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532–1543. [Online]. Available: https://doi.org/10....

work page doi:10.3115/v1/d14-1162 2014

[3] [3]

Generative AI of pinecone vector retrieval and retrieval-augmented generation architecture: Financial data-driven intelligent customer recommendation system,

Q. Hu, X. Li, Z. Li, and Y . Zhang, “Generative AI of pinecone vector retrieval and retrieval-augmented generation architecture: Financial data-driven intelligent customer recommendation system,” inProceedings of the 2025 2nd International Conference on Digital Economy and Computer Science, ser. DECS ’25. New York, NY , USA: Association for Computing Mach...

work page doi:10.1145/3785706.3785900 2025

[4] [4]

A curriculum recommendation system using a vector database for challenge-based learning,

O. B. Akhiiezer, O. A. Haluza, L. M. Lyubchyk, and V . Y . Sokol, “A curriculum recommendation system using a vector database for challenge-based learning,” inJournal of Physics Conference Series, ser. Journal of Physics Conference Series, vol. 3105. IOP, Sep. 2025, p. 012023. [Online]. Available: https://doi.org/10.1088/1742-6596/3105/1/012023

work page doi:10.1088/1742-6596/3105/1/012023 2025

[5] [5]

SMURF: federated multimodal retrieval for scientific data via embedding alignment,

S. Y . Oh, A. Khan, I. Foster, and K. Chard, “SMURF: federated multimodal retrieval for scientific data via embedding alignment,” in 2025 IEEE International Conference on eScience (eScience), 2025, pp. 452–459, https://doi.org/10.1109/eScience65000.2025.00091

work page doi:10.1109/escience65000.2025.00091 2025

[6] [6]

HybridRAG: integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction,

B. Sarmah, B. Hall, R. Rao, S. Patel, S. Pasquali, and D. Mehta, “HybridRAG: integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction,” 2024. [Online]. Available: https://doi.org/10.1145/3677052.3698671

work page doi:10.1145/3677052.3698671 2024

[7] [7]

A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models,

W. Fan, Y . Ding, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and Q. Li, “A survey on RAG meeting LLMs: Towards retrieval- augmented large language models,” 2024. [Online]. Available: https://doi.org/10.1145/3637528.3671470

work page doi:10.1145/3637528.3671470 2024

[8] [8]

MemGPT: Towards LLMs as operating systems,

C. Packer, S. Wooders, K. Lin, V . Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “MemGPT: Towards LLMs as operating systems,”

[9] [9]

MemGPT: Towards LLMs as Operating Systems

[Online]. Available: https://doi.org/10.48550/arXiv.2310.08560

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.08560

[10] [10]

MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

D. Jiang, Y . Li, G. Li, and B. Li, “MAGMA: a multi-graph based agentic memory architecture for AI agents,” 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2601.03236

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.03236 2026

[11] [11]

MemoryBank: Enhancing Large Language Models with Long-Term Memory

W. Zhong, L. Guo, Q. Gao, H. Ye, and Y . Wang, “MemoryBank: Enhancing large language models with long-term memory,” 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2305.10250 11ChatGPT [104] was used to improve the grammar and phrasing of this work

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.10250 2023

[12] [12]

A-MEM: Agentic Memory for LLM Agents

W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y . Zhang, “A-mem: Agentic memory for LLM agents,” inAdvances in Neural Information Processing Systems, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2502.12110

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.12110 2025

[13] [13]

Multi-modal foundation model for cosmological simulation data,

B. Xia, N. Ramachandra, A. I. Wells, S. Habib, and J. Wise, “Multi-modal foundation model for cosmological simulation data,”

[14] [14]

Available: https://doi.org/10.48550/arXiv.2510.07684

[Online]. Available: https://doi.org/10.48550/arXiv.2510.07684

work page doi:10.48550/arxiv.2510.07684

[15] [15]

TRAKNN: efficient trajectory aware spatiotemporal kNN for rare meteorological trajectory detection,

G. Coulaud and D. Faranda, “TRAKNN: efficient trajectory aware spatiotemporal kNN for rare meteorological trajectory detection,”

[16] [16]

Available: https://doi.org/10.48550/arXiv.2603.02059

[Online]. Available: https://doi.org/10.48550/arXiv.2603.02059

work page doi:10.48550/arxiv.2603.02059

[17] [17]

Dual retrieving and ranking medical large language model with retrieval augmented generation,

Q. Yang, H. Zuo, R. Su, H. Su, T. Zeng, H. Zhou, R. Wang, J. Chen, Y . Lin, Z. Chen, and T. Tan, “Dual retrieving and ranking medical large language model with retrieval augmented generation,”Scientific Reports, vol. 15, 05 2025. [Online]. Available: https://doi.org/10.1038/s41598-025-00724-w

work page doi:10.1038/s41598-025-00724-w 2025

[18] [18]

Exploring distributed vector databases performance on HPC platforms: A study with Qdrant,

S. Ockerman, A. Gueroudji, S. Y . Oh, R. Underwood, N. Chia, K. Chard, R. Ross, and S. Venkataraman, “Exploring distributed vector databases performance on HPC platforms: A study with Qdrant,”

[19] [19]

Available: https://doi.org/10.1145/3731599.3767404

[Online]. Available: https://doi.org/10.1145/3731599.3767404

work page doi:10.1145/3731599.3767404

[20] [20]

A foundational multimodal vision language ai assistant for human pathology,

M. Y . Lu, B. Chen, D. F. K. Williamson, R. J. Chen, K. Ikamura, G. Gerber, I. Liang, L. P. Le, T. Ding, A. V . Parwani, and F. Mahmood, “A foundational multimodal vision language ai assistant for human pathology,” 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2312.07814

work page doi:10.48550/arxiv.2312.07814 2023

[21] [21]

Automating literature screening and curation with applications to computational neuroscience,

Z. Ji, S. Guo, Y . Qiao, and R. A. McDougal, “Automating literature screening and curation with applications to computational neuroscience,”Journal of the American Medical Informatics Association, vol. 31, no. 7, pp. 1463–1470, 05 2024. [Online]. Available: https://doi.org/10.1093/jamia/ocae097

work page doi:10.1093/jamia/ocae097 2024

[22] [22]

and Senior, Catherine A

V . Eyring, S. Bony, G. A. Meehl, C. A. Senior, B. Stevens, R. J. Stouf- fer, and K. E. Taylor, “Overview of the coupled model intercompar- ison project phase 6 (CMIP6) experimental design and organization,” Geoscientific Model Development, vol. 9, no. 5, pp. 1937–1958, 2016, https://doi.org/10.5194/gmd-9-1937-2016

work page doi:10.5194/gmd-9-1937-2016 1937

[23] [23]

The International Genome Sample Resource (IGSR) collection of open human genomic variation resources,

S. Fairley, E. Lowy-Gallego, E. Perry, and P. Flicek, “The International Genome Sample Resource (IGSR) collection of open human genomic variation resources,”Nucleic Acids Research, vol. 48, no. D1, pp. D941–D947, 10 2019. [Online]. Available: https://doi.org/10.1093/nar/gkz836

work page doi:10.1093/nar/gkz836 2019

[24] [24]

What to support when you’re compressing: The state of practice gaps and opportunities for scientific data compression,

F. Cappello, R. Underwood, Y . Alexeev, A. Baker, E. Bozda ˘g, M. Burtscher, K. Chard, S. Di, K. G. Felker, P. C. O’Grady, H. Guo, Y . Huang, P. Jiang, S. Jin, P. Johansson, S. Li, X. Liang, E. Lindahl, P. Lindstrom, Z. Luki ´c, M. Lundborg, D. Lykov, M. Nagaso, K. Sato, A. Singh, S. W. Son, S. Song, W. Tang, D. Tao, J. Tian, K. Yoshii, and K. Zhao, “What...

work page doi:10.1145/3712285.3759856 2025

[25] [25]

Learned protein embeddings for machine learning,

K. K. Yang, Z. Wu, C. N. Bedbrook, and F. H. Arnold, “Learned protein embeddings for machine learning,”Bioinformatics, vol. 34, no. 15, pp. 2642–2648, 03 2018. [Online]. Available: https://doi.org/10.1093/bioinformatics/bty178

work page doi:10.1093/bioinformatics/bty178 2018

[26] [26]

Scalable embedding fusion with protein language models: insights from benchmarking text-integrated representations,

Y . S. Ko, J. Parkinson, and W. Wang, “Scalable embedding fusion with protein language models: insights from benchmarking text-integrated representations,”Briefings in Bioinformatics, vol. 27, no. 1, p. bbag014, 01 2026. [Online]. Available: https://doi.org/10.1093/bib/bbag014

work page doi:10.1093/bib/bbag014 2026

[27] [27]

When protein structure embedding meets large language models,

S. Ali, P. Chourasia, and M. Patterson, “When protein structure embedding meets large language models,”Genes, vol. 15, no. 1, 2024. [Online]. Available: https://doi.org/10.3390/genes15010025

work page doi:10.3390/genes15010025 2024

[28] [28]

Milvus: A purpose-built vector data management system,

J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xu, K. Yu, Y . Yuan, Y . Zou, J. Long, Y . Cai, Z. Li, Z. Zhang, Y . Mo, J. Gu, R. Jiang, Y . Wei, and C. Xie, “Milvus: A purpose-built vector data management system,” inProceedings of the 2021 International Conference on Management of Data, ser. SIGMOD ’21. New York, NY , USA: Assoc...

work page doi:10.1145/3448016.3457550 2021

[29] [29]

[Online]

Vespa Team, “Vespa,” 2026. [Online]. Available: https://vespa.ai/

2026

[30] [30]

[Online]

Qdrant Team, “Qdrant,” 2026. [Online]. Available: https://qdrant.tech/

2026

[31] [31]

[Online]

Vald Team, “Vald,” 2026. [Online]. Available: https://vald.vdaas.org/

2026

[32] [32]

Weaviate,

Weaviate Team, “Weaviate,” 2026. [Online]. Available: https: //weaviate.io/

2026

[33] [33]

Billion-scale similarity search with GPUs

J. Johnson, M. Douze, and H. J ´egou, “Billion-scale similarity search with GPUs,” 2017, https://doi.org/10.48550/arXiv.1702.08734

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1702.08734 2017

[34] [34]

Cloud-native vector search: A comprehensive performance analysis,

Z. Li, W. Ding, S. Huang, Z. Wang, Y . Lin, K. Wu, Y . Park, and J. Chen, “Cloud-native vector search: A comprehensive performance analysis,”

[35] [35]

Available: https://doi.org/10.48550/arXiv.2511.14748

[Online]. Available: https://doi.org/10.48550/arXiv.2511.14748

work page doi:10.48550/arxiv.2511.14748

[36] [36]

Evaluating the cloud for capability class leadership workloads,

J. R. Lange and et al., “Evaluating the cloud for capability class leadership workloads,” Oak Ridge National Laboratory, Technical Report ORNL/TM-2023/3083, September 2023. [Online]. Available: https://doi.org/10.2172/2000306

work page doi:10.2172/2000306 2023

[37] [37]

Usability evaluation of cloud for HPC applications,

V . Sochat, D. Milroy, A. Sarkar, A. Marathe, and T. Patki, “Usability evaluation of cloud for HPC applications,” 2025. [Online]. Available: https://doi.org/10.1145/3731599.3767353

work page doi:10.1145/3731599.3767353 2025

[38] [38]

A performance comparison of hpc workloads on traditional and cloud-based HPC clusters,

V . Munhoz, A. Bonfils, M. Castro, and O. Mendizabal, “A performance comparison of hpc workloads on traditional and cloud-based HPC clusters,” in2023 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2023, pp. 108–114, https://doi.org/10.1109/SBAC-PADW60351.2023.00026

work page doi:10.1109/sbac-padw60351.2023.00026 2023

[39] [39]

Ceems: A resource manager agnostic en- ergy and emissions monitoring stack,

R. Latham, R. B. Ross, P. Carns, S. Snyder, K. Harms, K. Velusamy, P. Coffman, and G. McPheeters, “Initial experiences with DAOS object storage on Aurora,” inProceedings of the SC ’24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, ser. SC- W ’24. IEEE Press, 2025, pp. 1304–1310. [Online]. Available...

work page doi:10.1109/scw63240.2024.00171 2025

[40] [40]

AI and HPC applications on leadership computing platforms: Performance and scalability studies,

J. Kwack, C. Bertoni, U. Unnikrishnan, R. Balin, K. Hossain, Y . Ghadar, T. J. Williams, A. Bagusetty, M. Thavappiragasam, V . Hatanp ¨a¨a, A. Vasan, J. Tramm, and S. Parker, “AI and HPC applications on leadership computing platforms: Performance and scalability studies,” in2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 202...

work page doi:10.1109/ipdps64566.2025.00027 2025

[41] [41]

Polaris and acceptance testing

B. Homerding, B. Lenard, C. Blackworth, C. Holohan, A. Kulyavtsev, G. McPheeters, E. Pershy, P. Rich, D. Waldron, M. Zhang, K. Harms, T. Leggett, and W. Allcock, “Polaris and acceptance testing.” CUG, 2023, https://cug.org/proceedings/cug2023 proceedings/includes/files/ pap109s2-file1.pdf

2023

[42] [42]

DAOS: A scale-out high performance storage stack for storage class memory,

Z. Liang, J. Lombardi, M. Chaarawi, and M. Hennecke, “DAOS: A scale-out high performance storage stack for storage class memory,” inSupercomputing Frontiers: 6th Asian Conference, SCFA 2020, Singapore, February 24–27, 2020, Proceedings. Berlin, Heidelberg: Springer-Verlag, 2020, pp. 40–54. [Online]. Available: https://doi.org/10.1007/978-3-030-48842-0 3

work page doi:10.1007/978-3-030-48842-0 2020

[43] [43]

Efficient indexing of billion-scale datasets of deep descriptors,

A. B. Yandex and V . Lempitsky, “Efficient indexing of billion-scale datasets of deep descriptors,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2055–2063. [Online]. Available: https://doi.org/10.1109/CVPR.2016.226

work page doi:10.1109/cvpr.2016.226 2016

[44] [44]

Product quantization for nearest neighbor search,

H. J ´egou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, 2011. [Online]. Available: https://doi.org/10.1109/TPAMI.2010.57

work page doi:10.1109/tpami.2010.57 2011

[45] [45]

dbpedia-entities- openai-1m,

Kumar Shivendu and Nirant Kasliwal, “dbpedia-entities- openai-1m,” 2023, https://huggingface.co/datasets/KShivendu/ dbpedia-entities-openai-1M

2023

[46] [46]

MOFA: discovering materials for carbon capture with a GenAI- and simulation-based workflow,

X. Yan, N. Hudson, H. Park, D. Grzenda, J. G. Pauloski, M. Schwarting, H. Pan, H. Harb, S. Foreman, C. Knight, T. Gibbs, K. Chard, S. Chaudhuri, E. Tajkhorshid, I. Foster, M. Moosavi, L. Ward, and E. A. Huerta, “MOFA: discovering materials for carbon capture with a GenAI- and simulation-based workflow,” 2025. [Online]. Available: https://doi.org/10.48550/...

work page doi:10.48550/arxiv.2501.10651 2025

[47] [47]

Osprey: Production-ready agentic AI for safety-critical control systems,

T. Hellert, J. Montenegro, and A. Sulc, “Osprey: Production-ready agentic AI for safety-critical control systems,” 2025. [Online]. Available: https://doi.org/10.1063/5.0306302

work page doi:10.1063/5.0306302 2025

[48] [48]

The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

M. Aum ¨uller and M. Ceccarello, “The role of local intrinsic dimensionality in benchmarking nearest neighbor search,” 2019. [Online]. Available: https://doi.org/10.48550/arXiv.1907.07387

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1907.07387 2019

[49] [49]

On the correlation be- tween local intrinsic dimensionality and outlierness,

M. E. Houle, E. Schubert, and A. Zimek, “On the correlation be- tween local intrinsic dimensionality and outlierness,” inSimilarity Search and Applications: 11th International Conference, SISAP 2018, Lima, Peru, October 7–9, 2018, Proceedings. Berlin, Heidelberg: Springer-Verlag, 2018, pp. 177–191, https://dl.acm.org/doi/10.1007/ 978-3-030-02224-2 14

2018

[50] [50]

The impacts of data, ordering, and intrinsic dimensionality on recall in hierarchical navigable small worlds,

O. P. Elliott and J. Clark, “The impacts of data, ordering, and intrinsic dimensionality on recall in hierarchical navigable small worlds,” in Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval, ser. ICTIR ’24. ACM, Aug. 2024, p. 25–33. [Online]. Available: https://doi.org/10.1145/3664190.3672512

work page doi:10.1145/3664190.3672512 2024

[51] [51]

The Lustre Storage Architecture

P. Braam, “The Lustre storage architecture,” 2019. [Online]. Available: https://doi.org/10.48550/arXiv.1903.01955

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1903.01955 2019

[52] [52]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Y . Zhang, M. Li, D. Long, X. Zhang, H. Lin, B. Yang, P. Xie, A. Yang, D. Liu, J. Lin, F. Huang, and J. Zhou, “Qwen3 embedding: Advancing text embedding and reranking through foundation models,”arXiv preprint arXiv:2506.05176, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2506.05176

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.05176 2025

[53] [53]

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

C. Lee, R. Roy, M. Xu, J. Raiman, M. Shoeybi, B. Catanzaro, and W. Ping, “NV-Embed: Improved techniques for training LLMs as generalist embedding models,” 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2405.17428

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2405.17428 2025

[54] [54]

Scalable nearest neighbor algorithms for high dimensional data,

M. Muja and D. G. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, pp. 2227–2240, 2014. [Online]. Available: https://doi.org/10.1109/TPAMI.2014.2321376

work page doi:10.1109/tpami.2014.2321376 2014

[55] [55]

Similarity search in high dimensions via hashing,

A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high dimensions via hashing,” inProceedings of the 25th International Conference on Very Large Data Bases, ser. VLDB ’99. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1999, p. 518–529. [Online]. Available: https://dl.acm.org/doi/10.5555/645925.671516

work page doi:10.5555/645925.671516 1999

[56] [56]

Data structures and algorithms for nearest neighbor search in general metric spaces,

P. N. Yianilos, “Data structures and algorithms for nearest neighbor search in general metric spaces,” inProceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA ’93. USA: Society for Industrial and Applied Mathematics, 1993, pp. 311–321. [Online]. Available: https://dl.acm.org/doi/10.5555/313559.313789

work page doi:10.5555/313559.313789 1993

[57] [57]

Approximate nearest neighbors: towards removing the curse of dimensionality,

P. Indyk and R. Motwani, “Approximate nearest neighbors: towards removing the curse of dimensionality,” inProceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, ser. STOC ’98. New York, NY , USA: Association for Computing Machinery, 1998, pp. 604–613. [Online]. Available: https://doi.org/10.1145/276698.276876

work page doi:10.1145/276698.276876 1998

[58] [58]

Multidimensional binary search trees used for associative searching,

J. L. Bentley, “Multidimensional binary search trees used for associative searching,”Commun. ACM, vol. 18, no. 9, pp. 509–517, Sep. 1975. [Online]. Available: https://doi.org/10.1145/361002.361007

work page doi:10.1145/361002.361007 1975

[59] [59]

R-trees: a dynamic index structure for spatial searching,

A. Guttman, “R-trees: a dynamic index structure for spatial searching,” inProceedings of the 1984 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’84. New York, NY , USA: Association for Computing Machinery, 1984, pp. 47–57. [Online]. Available: https://doi.org/10.1145/602259.602266

work page doi:10.1145/602259.602266 1984

[60] [60]

Efficient in-memory inverted indexes: Theory and practice,

J. Mackenzie, S. MacAvaney, A. Mallia, and M. Siedlaczek, “Efficient in-memory inverted indexes: Theory and practice,” inProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’25. New York, NY , USA: Association for Computing Machinery, 2025, pp. 4102–4105. [Online]. Available: https://...

work page doi:10.1145/3726302.3731688 2025

[61] [61]

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,

A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions,”Commun. ACM, vol. 51, no. 1, pp. 117–122, Jan. 2008. [Online]. Available: https://doi.org/10.1145/1327452.1327494

work page doi:10.1145/1327452.1327494 2008

[62] [62]

A survey on locality sensitive hashing algorithms and their applications,

O. Jafari, P. Maurya, P. Nagarkar, K. M. Islam, and C. Crushev, “A survey on locality sensitive hashing algorithms and their applications,”

[63] [63]

Available: https://doi.org/10.48550/arXiv.2102.08942

[Online]. Available: https://doi.org/10.48550/arXiv.2102.08942

work page doi:10.48550/arxiv.2102.08942

[64] [64]

Diskann: Fast accurate billion-point nearest neighbor search on a single node,

S. J. Subramanya, Devvrit, R. Kadekodi, R. Krishaswamy, and H. V . Simhadri, “Diskann: Fast accurate billion-point nearest neighbor search on a single node,” inNeurIPS 2019, November 2019. [Online]. Available: https://dl.acm.org/doi/abs/10.5555/3454287.3455520

work page doi:10.5555/3454287.3455520 2019

[65] [65]

Fast approximate nearest neighbor search with the navigating spreading-out graph,

C. Fu, C. Xiang, C. Wang, and D. Cai, “Fast approximate nearest neighbor search with the navigating spreading-out graph,” 2025. [Online]. Available: https://doi.org/10.14778/3303753.3303754

work page doi:10.14778/3303753.3303754 2025

[66] [66]

CAGRA: highly parallel graph construction and approximate nearest neighbor search for GPUs,

H. Ootomo, A. Naruse, C. Nolet, R. Wang, T. Feher, and Y . Wang, “CAGRA: highly parallel graph construction and approximate nearest neighbor search for GPUs,” 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2308.15136

work page doi:10.48550/arxiv.2308.15136 2024

[67] [67]

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

Y . A. Malkov and D. A. Yashunin, “Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs,” 2018, https://doi.org/10.48550/arXiv.1603.09320

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1603.09320 2018

[68] [68]

Rapidsai/raft: RAFT contains fundamental widely-used algorithms and primitives for data science, graph and machine learning

Rapidsai, “Rapidsai/raft: RAFT contains fundamental widely-used algorithms and primitives for data science, graph and machine learning.” 2022. [Online]. Available: https://github.com/rapidsai/raft

2022

[69] [69]

Chord: A scalable peer-to-peer lookup service for internet applications,

I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications,” inProceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ser. SIGCOMM ’01. New York, NY , USA: Association for Computing Machinery, 2001, pp. 1...

work page doi:10.1145/383059.383071 2001

[70] [70]

Vector databases for modelling, managing and querying big scientific data: Models, issues, paradigms,

A. Cuzzocrea, “Vector databases for modelling, managing and querying big scientific data: Models, issues, paradigms,” inProceedings of the 37th International Conference on Scalable Scientific Data Manage- ment, ser. SSDBM ’25. New York, NY , USA: Association for Com- puting Machinery, 2025, https://doi.org/10.1145/3733723.3742469

work page doi:10.1145/3733723.3742469 2025

[71] [71]

Vector database management systems: Fundamental concepts, use-cases, and current challenges,

T. Taipalus, “Vector database management systems: Fundamental concepts, use-cases, and current challenges,”Cognitive Systems Research, vol. 85, p. 101216, 2024. [Online]. Available: https: //doi.org/10.48550/arXiv.2309.11322

work page doi:10.48550/arxiv.2309.11322 2024

[72] [72]

2509.05382

J. J. Pan, J. Wang, and G. Li, “Survey of vector database management systems,” 2023. [Online]. Available: https://doi.org/10.48550/arXiv. 2310.14021

work page internal anchor Pith review doi:10.48550/arxiv 2023

[73] [73]

A comprehensive survey on vector database: Storage and retrieval technique, challenge,

Y . Han, C. Liu, and P. Wang, “A comprehensive survey on vector database: Storage and retrieval technique, challenge,”arXiv preprint arXiv:2310.11703, 2023. [Online]. Available: https://doi.org/10.48550/ arXiv.2310.11703

arXiv 2023

[74] [74]

Towards understanding systems trade-offs in retrieval-augmented generation model inference,

M. Shen, M. Umar, K. Maeng, G. E. Suh, and U. Gupta, “Towards understanding systems trade-offs in retrieval-augmented generation model inference,” 2024. [Online]. Available: https: //doi.org/10.48550/arXiv.2412.11854

work page doi:10.48550/arxiv.2412.11854 2024

[75] [75]

A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search,

M. Wang, X. Xu, Q. Yue, and Y . Wang, “A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search,”Proc. VLDB Endow., vol. 14, no. 11, p. 1964–1978, Jul

1964

[76] [76]

Available: https://doi.org/10.14778/3476249.3476255

[Online]. Available: https://doi.org/10.14778/3476249.3476255

work page doi:10.14778/3476249.3476255

[77] [77]

ANN-Benchmarks: a benchmarking tool for approximate nearest neighbor algorithms,

M. Aum ¨uller, E. Bernhardsson, and A. Faithfull, “ANN-Benchmarks: a benchmarking tool for approximate nearest neighbor algorithms,”

[78] [78]

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms

[Online]. Available: https://doi.org/10.48550/arXiv.1807.05614

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1807.05614

[79] [79]

On- tology generation using large language models

G. Pandit, M. Roder, and A.-C. Ngonga Ngomo, “Evaluating approximate nearest neighbour search systems on knowledge graph embeddings,” inThe Semantic Web: 22nd European Semantic Web Conference, ESWC 2025, Portoroz, Slovenia, June 1–5, 2025, Proceed- ings, Part I. Berlin, Heidelberg: Springer-Verlag, 2025, pp. 59–76. [Online]. Available: https://doi.org/10....

work page doi:10.1007/978-3-031-94575-5 2025

[80] [80]

RAGPerf: An end-to-end benchmarking framework for retrieval-augmented generation systems,

S. Li, Y . Zhou, Y . Xu, K. Chen, D. Waddington, S. Sundararaman, H. Franke, and J. Huang, “RAGPerf: An end-to-end benchmarking framework for retrieval-augmented generation systems,” 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2603.10765

work page doi:10.48550/arxiv.2603.10765 2026