SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems
Pith reviewed 2026-05-10 18:28 UTC · model grok-4.3
The pith
SwarmIO models IOPS-optimized SSDs at up to 40 MIOPS with 303.9x speedup for GPU-centric storage.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SwarmIO is an SSD emulator for massively parallel, GPU-centric storage that faithfully models IOPS-optimized SSDs at target performance levels of up to 40 MIOPS, achieving a 303.9x speedup over the state-of-the-art baseline SSD emulator under GPU-initiated I/O. It further demonstrates utility through a vector search case study showing that increasing SSD IOPS from 2.5 MIOPS to 40 MIOPS yields an average end-to-end speedup of up to 9.7x.
What carries the argument
SwarmIO emulator architecture that improves frontend scalability for massive request streams, reduces software overhead in emulating GPU-initiated I/O control and data paths, and lowers timing-model maintenance overhead at high request rates.
If this is right
- End-to-end quantitative evaluation of IOPS-optimized GPU-centric storage systems is now possible without physical hardware.
- Designers can measure the performance effects of scaling SSD IOPS from 2.5 MIOPS to 40 MIOPS in applications such as vector search.
- The emulator supports exploration of next-generation storage architectures targeting even higher IOPS targets.
- GPU storage system prototypes can be tested and iterated rapidly before hardware availability.
Where Pith is reading between the lines
- The approach may extend to emulating other massively parallel I/O patterns in non-GPU accelerators.
- Faster emulation cycles could accelerate hardware-software co-design for data-center GPU storage.
- Observed application speedups imply that real SSD IOPS gains would produce multiplicative benefits in GPU workloads.
- The title's reference to 100 million IOPS suggests the current 40 MIOPS target is an intermediate step toward higher rates.
Load-bearing premise
The timing models and overhead reductions in SwarmIO accurately capture real GPU-initiated I/O behavior at high request rates without introducing significant emulation artifacts or inaccuracies.
What would settle it
Direct comparison of SwarmIO's latency and throughput predictions against measurements from physical high-IOPS SSD hardware running identical GPU-initiated I/O workloads near 40 MIOPS would confirm or refute the model's fidelity.
Figures
read the original abstract
GPU-initiated I/O has emerged as a key mechanism for achieving high-throughput storage access by leveraging massive GPU thread-level parallelism, while recent industry trends point toward SSDs optimized for ultra-high random-read IOPS. Together, these trends are enabling the emergence of IOPS-optimized, GPU-centric storage systems. Despite this momentum, no existing framework enables quantitative end-to-end evaluation of storage systems optimized for GPU-initiated I/O. While conventional SSD emulators provide a promising path toward end-to-end modeling in traditional storage systems, they face three key challenges in this GPU-centric setting: limited frontend scalability for ingesting massive request streams, high software overhead in emulating GPU-initiated I/O control and data paths, and excessive timing-model maintenance overhead at extremely high I/O request rates. We propose SwarmIO, an SSD emulator for massively parallel, GPU-centric storage. SwarmIO faithfully models IOPS-optimized SSDs at target performance levels of up to 40 MIOPS, achieving a 303.9x speedup over the state-of-the-art baseline SSD emulator under GPU-initiated I/O. We further demonstrate its utility through a vector search case study, showing that increasing SSD IOPS from 2.5 MIOPS to 40 MIOPS yields an average end-to-end speedup of up to 9.7x.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents SwarmIO, an SSD emulator for GPU-centric storage systems supporting GPU-initiated I/O. It identifies three limitations in prior emulators (frontend scalability for massive request streams, software overhead on GPU I/O paths, and timing-model maintenance at high rates) and claims to address them, enabling faithful modeling of IOPS-optimized SSDs up to 40 MIOPS. Key results include a 303.9x speedup over the state-of-the-art baseline emulator under GPU-initiated I/O and a vector-search case study showing up to 9.7x end-to-end speedup when scaling SSD IOPS from 2.5 MIOPS to 40 MIOPS.
Significance. If the timing models prove accurate, SwarmIO would be a significant contribution by enabling quantitative end-to-end evaluation of emerging GPU-centric storage architectures at performance levels (40+ MIOPS) that are currently impractical to prototype or simulate with existing tools. This could accelerate design exploration in high-performance computing and AI workloads that rely on massive parallel random-read I/O.
major comments (2)
- [Results/Evaluation section] Results/Evaluation section: The central claim that SwarmIO 'faithfully models' IOPS-optimized SSDs at up to 40 MIOPS (including flash channel contention and controller effects under massive GPU thread parallelism) is load-bearing for the reported 303.9x speedup and 9.7x case-study gain, yet the manuscript provides no direct validation against physical SSD hardware at target rates with GPU-initiated I/O. If comparisons are limited to other emulators, synthetic traces, or lower-rate regimes, the speedup figures may reflect model simplifications rather than real-system fidelity.
- [Abstract and §1] Abstract and §1: The assertion of 'no existing framework' for quantitative end-to-end evaluation of GPU-initiated I/O storage systems requires explicit comparison to the closest prior GPU-aware or high-IOPS emulators; without this, it is unclear whether SwarmIO's overhead reductions are incremental or fundamentally new.
minor comments (2)
- [Title and Abstract] The title references '100 Million IOPS' while all concrete claims and results target 40 MIOPS; clarifying the gap between aspirational and demonstrated performance would improve precision.
- [Figures/Tables] Figure and table captions should explicitly state whether error bars or confidence intervals are shown and what baseline configuration (e.g., number of GPU threads, request pattern) was used for the 303.9x measurement.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We address each major comment point by point below, proposing targeted revisions to strengthen the presentation of our contributions while maintaining accuracy.
read point-by-point responses
-
Referee: [Results/Evaluation section] Results/Evaluation section: The central claim that SwarmIO 'faithfully models' IOPS-optimized SSDs at up to 40 MIOPS (including flash channel contention and controller effects under massive GPU thread parallelism) is load-bearing for the reported 303.9x speedup and 9.7x case-study gain, yet the manuscript provides no direct validation against physical SSD hardware at target rates with GPU-initiated I/O. If comparisons are limited to other emulators, synthetic traces, or lower-rate regimes, the speedup figures may reflect model simplifications rather than real-system fidelity.
Authors: We acknowledge the value of direct hardware validation at target rates. Such validation is inherently limited because IOPS-optimized SSDs with native GPU-initiated I/O support at 40 MIOPS scale are emerging technologies not yet available for comprehensive end-to-end benchmarking. Our timing models are derived from commercial SSD datasheets, flash channel specifications, and controller behaviors documented in prior literature. We have performed validation against available lower-rate physical SSDs and synthetic traces that reproduce known contention effects. In revision, we will expand the evaluation section with additional details on model derivation, low-rate hardware comparisons, and an explicit limitations discussion on high-rate fidelity. This is a partial revision as we cannot fabricate unavailable hardware data. revision: partial
-
Referee: [Abstract and §1] Abstract and §1: The assertion of 'no existing framework' for quantitative end-to-end evaluation of GPU-initiated I/O storage systems requires explicit comparison to the closest prior GPU-aware or high-IOPS emulators; without this, it is unclear whether SwarmIO's overhead reductions are incremental or fundamentally new.
Authors: We will revise the abstract and Section 1 to include an explicit comparison to the closest prior emulators (both high-IOPS and any GPU-aware variants). Our claim centers on the absence of any framework that simultaneously supports GPU-initiated I/O, scales to 40+ MIOPS, and maintains low software overhead on the GPU path. We will add a table or paragraph differentiating SwarmIO from related work on these axes to clarify the novelty. revision: yes
Circularity Check
No circularity: performance claims are measured speedups against external baseline
full rationale
The paper's derivation chain consists of engineering solutions to three stated challenges (frontend scalability, GPU I/O path overhead, timing-model maintenance) followed by direct benchmarking of the resulting emulator against a state-of-the-art baseline. Reported figures (303.9x speedup, 9.7x end-to-end gain) are presented as empirical measurements on synthetic and case-study workloads, not as outputs of fitted parameters or self-referential equations. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the modeling or evaluation sections; the timing model is described as an implementation artifact whose accuracy is asserted via comparison to the external baseline rather than by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SwarmIO faithfully models IOPS-optimized SSDs at target performance levels of up to 40 MIOPS, achieving a 303.9x speedup over the state-of-the-art baseline SSD emulator under GPU-initiated I/O.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce an aggregated timing model update mechanism that amortizes state management overhead across a group of requests while preserving high-fidelity timing emulation.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Design tradeoffs for ssd performance,
N. Agrawal, V . Prabhakaran, T. Wobber, J. D. Davis, M. Manasse, and R. Panigrahy, “Design tradeoffs for ssd performance,” inUSENIX Annual Technical Conference (ATC), 2008
work page 2008
-
[2]
J. Axboe, “Flexible I/O Tester,” 2024. [Online]. Available: https: //github.com/axboe/fio
work page 2024
-
[3]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The gem5 Simulator,”ACM SIGARCH Computer Architecture News, 2011
work page 2011
-
[4]
High IOPS SSDs for AI Use Cases,
R. Bolt, “High IOPS SSDs for AI Use Cases,” Flash Memory Summit (FMS), 2025
work page 2025
-
[5]
GMT: GPU Orchestrated Memory Tiering for the Big Data Era,
C.-H. Chang, J. Han, A. Sivasubramaniam, V . Sharma Mailthody, Z. Qureshi, and W.-m. Hwu, “GMT: GPU Orchestrated Memory Tiering for the Big Data Era,” inProceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
work page 2024
-
[6]
Dynamic Warp For- mation and Scheduling for Efficient GPU Control Flow,
W. W. Fung, I. Sham, G. Yuan, and T. M. Aamodt, “Dynamic Warp For- mation and Scheduling for Efficient GPU Control Flow,” inProceedings of the International Symposium on Microarchitecture (MICRO), 2007
work page 2007
-
[7]
N. M. Ghiasi, M. Sadrosadati, H. Mustafa, A. Gollwitzer, C. Firtina, J. Eudine, H. Mao, J. Lindegger, M. B. Cavlak, M. Alser, J. Park, and O. Mutlu, “Megis: High-performance, energy-efficient, and low-cost metagenomic analysis with in-storage processing,” inProceedings of the International Symposium on Computer Architecture (ISCA), 2024
work page 2024
-
[8]
Amber: Enabling Precise Full-system Simulation with Detailed Modeling of All SSD Resources,
D. Gouk, M. Kwon, J. Zhang, S. Koh, W. Choi, N. S. Kim, M. Kandemir, and M. Jung, “Amber: Enabling Precise Full-system Simulation with Detailed Modeling of All SSD Resources,” inProceedings of the International Symposium on Microarchitecture (MICRO), 2018
work page 2018
-
[9]
Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD,
H. Guo and Y . Lu, “Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD,” inProceedings of the USENIX Symposium on Operating Systems Design and Implemen- tation (OSDI), 2025
work page 2025
-
[10]
Asynchrony and GPUs: Bridging this Dichotomy for I/O with AGIO,
J. Han, A. Sivasubramaniam, C.-H. Chang, V . S. Mailthody, Z. Qureshi, and W.-M. Hwu, “Asynchrony and GPUs: Bridging this Dichotomy for I/O with AGIO,” inProceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2026
work page 2026
-
[11]
ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction,
K. Han, H. Gwak, D. Shin, and J. Hwang, “ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction,” in Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2021
work page 2021
-
[12]
The unwritten contract of solid state drives,
J. He, S. Kannan, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “The unwritten contract of solid state drives,” inProceedings of the Twelfth European Conference on Computer Systems, 2017
work page 2017
-
[13]
Y . Hu, H. Jiang, D. Feng, L. Tian, H. Luo, and S. Zhang, “Performance impact and interplay of ssd parallelism through advanced commands, allocation strategy and data granularity,” inProceedings of the Interna- tional Conference on Supercomputing, 2011
work page 2011
-
[14]
Intel Data Streaming Accelerator (Intel DSA),
Intel, “Intel Data Streaming Accelerator (Intel DSA),” 2022. [Online]. Available: https://www.intel.com/content/www/us/en/products/ docs/accelerator-engines/data-streaming-accelerator.html
work page 2022
-
[15]
OpenExpress: Fully Hardware Automated Open Research Framework for Future Fast NVMe Devices,
M. Jung, “OpenExpress: Fully Hardware Automated Open Research Framework for Future Fast NVMe Devices,” inUSENIX Annual Tech- nical Conference (ATC), 2020
work page 2020
-
[16]
Nandflashsim: High-fidelity, microarchitecture-aware nand flash memory simulation,
M. Jung, W. Choi, S. Gao, E. H. Wilson III, D. Donofrio, J. Shalf, and M. T. Kandemir, “Nandflashsim: High-fidelity, microarchitecture-aware nand flash memory simulation,”ACM Trans. Storage, 2016
work page 2016
-
[17]
Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling,
M. Khairy, Z. Shen, T. M. Aamodt, and T. G. Rogers, “Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling,” in Proceedings of the International Symposium on Computer Architecture (ISCA), 2020
work page 2020
-
[18]
C. Kim, “Beyond SSD : SK Hynix AIN Family Redefining Storage as the Core Enabler of AI at Scale presented by SK Hynix,” Open Compute Project (OCP) Global Summit, 2025
work page 2025
-
[19]
J. Kim, B. Shin, J. Chung, and M. Rhu, “The cost of dynamic reasoning: Demystifying ai agents and test-time scaling from an ai infrastructure perspective,” inProceedings of the International Symposium on High- Performance Computer Architecture (HPCA), 2026
work page 2026
-
[20]
NVMeVirt: A Versatile Software-defined Virtual NVMe Device,
S.-H. Kim, J. Shim, E. Lee, S. Jeong, I. Kang, and J.-S. Kim, “NVMeVirt: A Versatile Software-defined Virtual NVMe Device,” in Proceedings of the Conference on File and Storage Technologies (FAST), 2023
work page 2023
-
[21]
Flashsim: A simulator for nand flash-based solid-state drives,
Y . Kim, B. Tauras, A. Gupta, and B. Urgaonkar, “Flashsim: A simulator for nand flash-based solid-state drives,” in2009 First International Conference on Advances in System Simulation, 2009
work page 2009
-
[22]
KIOXIA CM9-V Series (2.5-inch),
KIOXIA, “KIOXIA CM9-V Series (2.5-inch),” 2025. [Online]. Available: https://americas.kioxia.com/en-us/business/ssd/enterprise- ssd/cm9-v.html
work page 2025
-
[23]
——, “KIOXIA XL-FLASH,” 2025. [Online]. Avail- able: https://kr.kioxia.com/content/dam/kioxia/shared/business/memory/ xlflash/asset/KIOXIA XL-FLASH Infographic.pdf
work page 2025
-
[24]
R. Kuper, I. Jeong, Y . Yuan, R. Wang, N. Ranganathan, N. Rao, J. Hu, S. Kumar, P. Lantz, and N. S. Kim, “A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors,” inProceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
work page 2024
-
[25]
Cosmos+ OpenSSD: Rapid Prototype for Flash Storage Systems,
J. Kwak, S. Lee, K. Park, J. Jeong, and Y . H. Song, “Cosmos+ OpenSSD: Rapid Prototype for Flash Storage Systems,”ACM Transactions on Storage, 2020
work page 2020
-
[26]
FADU: Pushing the Storage Frontier: Next- Generation SSDs for Tomorrow’s Datacenters,
J. Lee and R. Stenfort, “FADU: Pushing the Storage Frontier: Next- Generation SSDs for Tomorrow’s Datacenters,” Flash Memory Summit (FMS), 2025
work page 2025
-
[27]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K¨uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel, S. Riedel, and D. Kiela, “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” inProceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2020
work page 2020
-
[28]
The Case of FEMU: Cheap, Accurate, Scalable and Extensible Flash Emulator,
H. Li, M. Hao, M. H. Tong, S. Sundararaman, M. Bjørling, and H. S. Gunawi, “The Case of FEMU: Cheap, Accurate, Scalable and Extensible Flash Emulator,” inProceedings of the Conference on File and Storage Technologies (FAST), 2018
work page 2018
-
[29]
Managing Scalable Direct Storage Accesses for GPUs with GoFS,
S. Li, Y . E. Zhou, Y . Xue, Y . Xu, and J. Huang, “Managing Scalable Direct Storage Accesses for GPUs with GoFS,” inProceedings of the ACM Symposium on Operating Systems Principles (SOSP), 2025
work page 2025
-
[30]
S. Li, F. Tu, L. Liu, J. Lin, Z. Wang, Y . Kang, Y . Ding, and Y . Xie, “Ecssd: Hardware/data layout co-designed in-storage-computing archi- tecture for extreme classification,” inProceedings of the International Symposium on Computer Architecture (ISCA), 2023
work page 2023
-
[31]
Linux Kernel Organization, “DMAEngine documentation,” 2026. [Online]. Available: https://www.kernel.org/doc/html/latest/driver-api/ dmaengine/index.html
work page 2026
-
[32]
Advancing Memory and Storage Architectures for Next-Gen AI Workloads,
V . S. Mailthody, “Advancing Memory and Storage Architectures for Next-Gen AI Workloads,” Flash Memory Summit (FMS), 2025
work page 2025
-
[33]
FlexDrive: A Framework to Explore NVMe Storage Solutions,
K. T. Malladi, M. Awasthi, and H. Zheng, “FlexDrive: A Framework to Explore NVMe Storage Solutions,” inProceedings of the International Conference on High Performance Computing and Communications; International Conference on Smart City; International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2016
work page 2016
-
[34]
Marvell Bravera SC5 SSD Controllers,
Marvell, “Marvell Bravera SC5 SSD Controllers,” 2021. [Online]. Available: https://www.marvell.com/content/dam/marvell/en/public- collateral/storage/marvell-ssd-mv-ss1331-1333-product-brief.pdf
work page 2021
-
[35]
Graphssd: Graph semantics aware ssd,
K. K. Matam, G. Koo, H. Zha, H.-W. Tseng, and M. Annavaram, “Graphssd: Graph semantics aware ssd,” inProceedings of the Inter- national Symposium on Computer Architecture (ISCA), 2019
work page 2019
-
[36]
Micron, “9550 NVMe SSD,” 2024. [Online]. Available: https: //www.micron.com/products/storage/ssd/data-center-ssd/9550-ssd
work page 2024
-
[37]
Deep Learning Recommendation Model for Personalization and Recommendation Systems,
M. Naumov, D. Mudigere, H.-J. M. Shi, J. Huang, N. Sundaraman, J. Park, X. Wang, U. Gupta, C.-J. Wu, A. G. Azzolini, D. Dzhulgakov, A. Mallevich, I. Cherniavskii, Y . Lu, R. Krishnamoorthi, A. Yu, V . Kon- dratenko, S. Pereira, X. Chen, W. Chen, V . Rao, B. Jia, L. Xiong, and M. Smelyanskiy, “Deep Learning Recommendation Model for Personalization and Reco...
work page 2019
-
[38]
Storage Implications for the New Generation of AI Applications,
C. J. Newburn and W.-m. Hwu, “Storage Implications for the New Generation of AI Applications,” SNIA Developer Conference (SDC), 2025
work page 2025
-
[39]
Technical Paths to the New Era of GPU-initiated Storage,
C. J. Newburn and V . S. Mailthody, “Technical Paths to the New Era of GPU-initiated Storage,” Open Compute Project (OCP) Global Summit, 2025
work page 2025
-
[40]
StorageNext for AI: How to Eliminate the Memory Wall for GenAI and LLM Workloads,
C. Newburn, P. Prabhu, and V . S. Mailthody, “StorageNext for AI: How to Eliminate the Memory Wall for GenAI and LLM Workloads,” NVIDIA GTC, 2025. [Online]. Available: https://www.nvidia.com/en- us/on-demand/session/gtc25-s73012/
work page 2025
-
[41]
NVIDIA, “NVIDIA H200 GPU,” 2024. [Online]. Available: https: //www.nvidia.com/en-us/data-center/h200/
work page 2024
-
[42]
——, “GPUDirect RDMA,” 2026. [Online]. Available: https://docs. nvidia.com/cuda/pdf/GPUDirect RDMA.pdf
work page 2026
-
[43]
NVIDIA CMX Context Memory Storage Platform,
——, “NVIDIA CMX Context Memory Storage Platform,”
-
[44]
Available: https://www.nvidia.com/en-us/data-center/ai- storage/cmx/
[Online]. Available: https://www.nvidia.com/en-us/data-center/ai- storage/cmx/
-
[45]
NVM Express Base Specification,
NVM Express, “NVM Express Base Specification,” 2026. [Online]. Available: https://nvmexpress.org/specification/nvm-express- base-specification/
work page 2026
-
[46]
Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus,
H. Ootomo, A. Naruse, C. Nolet, R. Wang, T. Feher, and Y . Wang, “Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus,” inProceedings of the International Con- ference on Data Engineering (ICDE), 2024
work page 2024
-
[47]
InstAttention: In-Storage Attention Offloading for Cost- Effective Long-Context LLM Inference,
X. Pan, E. Li, Q. Li, S. Liang, Y . Shan, K. Zhou, Y . Luo, X. Wang, and J. Zhang, “InstAttention: In-Storage Attention Offloading for Cost- Effective Long-Context LLM Inference,” inProceedings of the Interna- tional Symposium on High-Performance Computer Architecture (HPCA), 2025
work page 2025
-
[48]
J. B. Park, V . S. Mailthody, Z. Qureshi, and W.-m. Hwu, “Accelerat- ing Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses,” inProceedings of the VLDB Endowment (PVLDB), 2024
work page 2024
-
[49]
GeminiFS: A Companion File System for GPUs,
S. Qiu, W. Liu, Y . Hu, J. Yan, Z. Shen, X. Yao, R. Chen, G. Zhang, and Y . Zhang, “GeminiFS: A Companion File System for GPUs,” in Proceedings of the Conference on File and Storage Technologies (FAST), 2025
work page 2025
-
[50]
A high-performance and scalable nvme controller featuring hardware acceleration,
Y . Qiu, W. Yin, and L. Wang, “A high-performance and scalable nvme controller featuring hardware acceleration,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022
work page 2022
-
[51]
GPU-Initiated On-Demand High- Throughput Storage Access in the BaM System Architecture,
Z. Qureshi, V . S. Mailthody, I. Gelado, S. Min, A. Masood, J. Park, J. Xiong, C. J. Newburn, D. Vainbrand, I.-H. Chung, M. Gar- land, W. Dally, and W.-m. Hwu, “GPU-Initiated On-Demand High- Throughput Storage Access in the BaM System Architecture,” inPro- ceedings of the International Conference on Architectural Support for Programming Languages and Oper...
work page 2023
-
[52]
Hermes: Algorithm-System Co-design for Efficient Retrieval-Augmented Gen- eration At-Scale,
M. Shen, M. Umar, K. Maeng, G. E. Suh, and U. Gupta, “Hermes: Algorithm-System Co-design for Efficient Retrieval-Augmented Gen- eration At-Scale,” inProceedings of the International Symposium on Computer Architecture (ISCA), 2025
work page 2025
-
[53]
Turbocharging Vector Databases Using Modern SSDs,
J. Shim, J. Oh, H. Roh, J. Do, and S.-W. Lee, “Turbocharging Vector Databases Using Modern SSDs,” inProceedings of the VLDB Endow- ment (PVLDB), 2025
work page 2025
-
[54]
Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neighbor Search,
H. V . Simhadri, G. Williams, M. Aum ¨uller, M. Douze, A. Babenko, D. Baranchuk, Q. Chen, L. Hosseini, R. Krishnaswamny, G. Srinivasa, S. J. Subramanya, and J. Wang, “Results of the NeurIPS’21 Challenge on Billion-Scale Approximate Nearest Neighbor Search,” inProceedings of Machine Learning Research (PMLR), 2021
work page 2021
-
[55]
Solidigm, “D7-PS1010,” 2024. [Online]. Available: https://www. solidigm.com/products/data-center/d7/ps1010.html
work page 2024
-
[56]
ConfZNS: A Novel Emulator for Exploring Design Space of ZNS SSDs,
I. Song, M. Oh, B. S. J. Kim, S. Yoo, J. Lee, and J. Choi, “ConfZNS: A Novel Emulator for Exploring Design Space of ZNS SSDs,” in Proceedings of the ACM International Conference on Systems and Storage (SYSTOR), 2023
work page 2023
-
[57]
CAM: Asynchronous GPU-Initiated, CPU- Managed SSD Management for Batching Storage Access,
Z. Song, J. Zhang, J. Sun, M. Sun, Z. Yang, Z. Zhang, X. Chen, F. Wu, H. Tang, and Z. Wang, “CAM: Asynchronous GPU-Initiated, CPU- Managed SSD Management for Batching Storage Access,” inProceed- ings of the International Conference on Data Engineering (ICDE), 2025
work page 2025
-
[58]
SPDK, “SPDK: NVMe Driver,” 2026. [Online]. Available: https: //spdk.io/doc/nvme.html
work page 2026
-
[59]
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single node,
S. J. Subramanya, Devvrit, R. Kadekodi, R. Krishaswamy, and H. V . Simhadri, “DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single node,” inProceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 2019
work page 2019
-
[60]
MQSim: A Framework for Enabling Realistic Studies of Modern Multi- Queue SSD Devices,
A. Tavakkol, J. G ´omez-Luna, M. Sadrosadati, S. Ghose, and O. Mutlu, “MQSim: A Framework for Enabling Realistic Studies of Modern Multi- Queue SSD Devices,” inProceedings of the Conference on File and Storage Technologies (FAST), 2018
work page 2018
-
[61]
B. Tian, H. Liu, Y . Tang, S. Xiao, Z. Duan, X. Liao, H. Jin, X. Zhang, J. Zhu, and Y . Zhang, “Towards High-throughput and Low-latency Billion-scale Vector Search via CPU/GPU Collaborative Filtering and Re-ranking,” inProceedings of the Conference on File and Storage Technologies (FAST), 2025
work page 2025
-
[62]
M. Wang, W. Xu, X. Yi, S. Wu, Z. Peng, X. Ke, Y . Gao, X. Xu, R. Guo, and C. Xie, “Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment,”Proceedings of the ACM on Management of Data, 2024
work page 2024
-
[63]
VSSIM: Virtual Machine based SSD Simulator,
J. Yoo, Y . Won, J. Hwang, S. Kang, J. Choi, S. Yoon, and J. Cha, “VSSIM: Virtual Machine based SSD Simulator,” inProceedings of the Symposium on Mass Storage Systems and Technologies (MSST), 2013
work page 2013
-
[64]
Cylon: Fast and accurate full-system emulation of cxl-ssds,
D. Yoon, H. Idden, J. Liu, B. Inceisci, S. H. Noh, and H. Li, “Cylon: Fast and accurate full-system emulation of cxl-ssds,” in24th USENIX Conference on File and Storage Technologies (FAST 26), 2026
work page 2026
-
[65]
Fssd: Fpga-based emulator for ssds,
L. Yu, Y . Lu, M. Mandava, E. Richter, V . S. Mailthody, S. W. Min, W.-m. Hwu, and D. Chen, “Fssd: Fpga-based emulator for ssds,” in 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL), 2023
work page 2023
-
[66]
Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product,
Y . Yuan, R. Wang, N. Ranganathan, N. Rao, S. Kumar, P. Lantz, V . Sanjeepan, J. Cabrera, A. Kwatra, R. Sankaran, I. Jeong, and N. S. Kim, “Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product,” inProceedings of the International Symposium on Computer Architecture (ISCA), 2024
work page 2024
-
[67]
Cemu: Enabling full-system emulation of computational storage beyond hardware limits,
Q. Zhang, J. Wang, Y . Zhou, P. Xu, K. Lu, J. Wan, F. Wu, and T. Lu, “Cemu: Enabling full-system emulation of computational storage beyond hardware limits,” inProceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2026
work page 2026
-
[68]
Assasin: Architecture support for stream computing to accelerate computational storage,
C. Zou and A. A. Chien, “Assasin: Architecture support for stream computing to accelerate computational storage,” inProceedings of the International Symposium on Microarchitecture (MICRO), 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.