HashGraph -- Scalable Hash Tables Using A Sparse Graph Data Structure
Pith reviewed 2026-05-25 01:51 UTC · model grok-4.3
The pith
HashGraph builds hash tables as sparse graphs to handle collisions without chaining or open addressing while remaining fast at any load factor.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HashGraph models the mapping of keys to hash buckets as a sparse graph in which vertices correspond to unique hash values and edges encode collision relationships, supporting a dedicated probing algorithm that achieves constant-time lookups independent of collision multiplicity or load factor for static data.
What carries the argument
The sparse graph representation of hash collisions, where vertices stand for hash values and edges capture bucket assignments to enable efficient construction and graph-based probing.
If this is right
- Hash table construction and probing can sustain rates above 2 billion keys per second on GPUs for static inputs.
- Performance stays constant even when individual buckets hold large numbers of colliding keys.
- The method works across a wide range of load factors without the slowdowns seen in other collision-resolution schemes.
- Recent advances in dynamic graph structures could allow extension beyond the current static-input limitation.
Where Pith is reading between the lines
- The graph encoding might enable new parallel query patterns that traverse collision edges for aggregate operations rather than single lookups.
- Similar sparse-graph modeling could be tested on CPU architectures to check whether the memory-access improvements generalize beyond the reported GPU results.
- Treating collisions as explicit edges suggests the possibility of applying graph compression techniques to further reduce memory use in high-collision regimes.
Load-bearing premise
A sparse graph of collisions can be constructed and traversed efficiently for arbitrary numbers of collisions per bucket while remaining independent of load factor.
What would settle it
Measure build and probe times while systematically increasing the average collisions per bucket from 1 to several hundred; constant times would support the claim, while proportional growth would falsify it.
read the original abstract
Hash tables are ubiquitous and used in a wide range of applications for efficient probing of large and unsorted data. If designed properly, hash-tables can enable efficients look ups in a constant number of operations or commonly referred to as O(1) operations. As data sizes continue to grow and data becomes less structured (as is common for big-data applications), the need for efficient and scalable hash table also grows. In this paper we introduce HashGraph, a new scalable approach for building hash tables that uses concepts taken from sparse graph representations--hence the name HashGraph. We show two different variants of HashGraph, a simple algorithm that outlines the method to create the hash-table and an advanced method that creates the hash table in a more efficient manner (with an improved memory access pattern). HashGraph shows a new way to deal with hash-collisions that does not use "open-addressing" or "chaining", yet has all the benefits of both these approaches. HashGraph currently works for static inputs, though recent progress with dynamic graph data structures suggest that HashGraph might be extended to dynamic inputs as well. We show that HashGraph can deal with a large number of hash-values per entry without loss of performance as most open-addressing and chaining approaches have. Further, we show that HashGraph is indifferent to the load-factor. Lastly, we show a new probing algorithm for the second phase of value lookups. Given the above, HashGraph is extremely fast and outperforms several state of the art hash-table implementations. The implementation of HashGraph in this paper is for NVIDIA GPUs, though HashGraph is not architecture dependent. Using a NVIDIA GV100 GPU, HashGraph is anywhere from 2X-8X faster than cuDPP, WarpDrive, and cuDF. HashGraph is able to build a hash-table at a rate of 2.5 billion keys per second and can probe at nearly the same rate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces HashGraph, a hash-table construction technique based on sparse graph representations that claims to handle collisions without using open-addressing or chaining while retaining the benefits of both. It presents a simple algorithm and an advanced variant with improved memory access, asserts indifference to load factor and to the number of collisions per bucket, introduces a new second-phase probing method, and reports GPU performance of 2.5 billion keys per second for both build and probe, outperforming cuDPP, WarpDrive, and cuDF by 2-8X on an NVIDIA GV100. The work is restricted to static inputs.
Significance. If the algorithmic construction, complexity claims, and empirical results can be substantiated, HashGraph would constitute a new collision-resolution paradigm with potential advantages for high-load-factor or high-collision workloads in big-data settings; the reported build rates and architecture independence would be practically relevant.
major comments (3)
- [Abstract] Abstract and throughout: the central claim that a sparse-graph representation yields build and probe times/memory independent of the number of collisions per bucket (i.e., node degree) and of load factor is asserted without any complexity statement, adjacency-list layout description, or pseudocode. Standard CSR or edge-list representations would incur linear dependence on degree, directly contradicting the claimed benefits over chaining.
- [Abstract] Abstract: no construction algorithm, data-layout details, or analysis of the “advanced method” is supplied, so the asserted improvement in memory-access pattern and the new second-phase probe cannot be evaluated; these omissions are load-bearing for the performance and scalability claims.
- [Abstract] Abstract: the reported 2.5 billion keys/sec build rate and 2-8X speedups are presented without experimental methodology, input distributions, tested load factors, collision multiplicities, or comparison baselines, rendering the empirical claims unverifiable.
minor comments (2)
- [Abstract] Abstract contains the typo “efficients look ups”.
- [Abstract] The statement that HashGraph “might be extended to dynamic inputs” references “recent progress with dynamic graph data structures” but supplies no citations.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We agree that the abstract requires additional supporting details on algorithms, complexity, data layouts, and experimental methodology to substantiate the claims, and we will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract and throughout: the central claim that a sparse-graph representation yields build and probe times/memory independent of the number of collisions per bucket (i.e., node degree) and of load factor is asserted without any complexity statement, adjacency-list layout description, or pseudocode. Standard CSR or edge-list representations would incur linear dependence on degree, directly contradicting the claimed benefits over chaining.
Authors: We agree that the abstract asserts the independence claim without the requested supporting material. The manuscript body presents the algorithm, but we will add an explicit complexity analysis and pseudocode showing O(1) build/probe independent of degree and load factor. The representation is a custom sparse structure (not standard CSR) that pre-allocates offsets to avoid linear dependence on bucket degree; we will describe the layout and contrast it with chaining/CSR in the revision. revision: yes
-
Referee: [Abstract] Abstract: no construction algorithm, data-layout details, or analysis of the “advanced method” is supplied, so the asserted improvement in memory-access pattern and the new second-phase probe cannot be evaluated; these omissions are load-bearing for the performance and scalability claims.
Authors: The construction for the simple and advanced variants, along with data layout, appears in the manuscript body, but we acknowledge the need for more explicit analysis. We will add pseudocode for both variants, a description of the advanced method's reordered traversal for better locality, and details on the second-phase graph-based probe. These additions will be made in the revised version. revision: yes
-
Referee: [Abstract] Abstract: the reported 2.5 billion keys/sec build rate and 2-8X speedups are presented without experimental methodology, input distributions, tested load factors, collision multiplicities, or comparison baselines, rendering the empirical claims unverifiable.
Authors: The experimental section details the methodology, uniform random 64-bit key inputs, load factors from 0.1-0.95, collision multiplicities, and baselines (cuDPP, WarpDrive, cuDF). We will revise the abstract to reference this setup briefly and ensure all parameters are clearly stated upfront. revision: yes
Circularity Check
No circularity: algorithmic description and empirical results with no derivations or fitted parameters
full rationale
The paper introduces HashGraph as a new algorithmic construction for hash tables based on sparse graph representations. It describes two variants (simple and advanced), reports performance metrics from GPU implementation, and states properties such as load-factor indifference and handling of high collision counts. No equations, first-principles derivations, parameter fitting, or self-citations appear as load-bearing steps in the provided text. All claims are presented as direct consequences of the algorithm design and benchmark measurements rather than reductions to prior inputs by construction. This is the expected non-finding for a purely algorithmic/empirical systems paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Sparse graph representations can encode hash tables such that collision handling inherits benefits of both chaining and open-addressing without their drawbacks.
invented entities (1)
-
HashGraph
no independent evidence
Reference graph
Works this paper leans on
- [1]
-
[2]
Dan A Alcantara, Andrei Sharf, Fatemeh Abbasinejad, Shubhabrata Sengupta, Michael Mitzenmacher, John D Owens, and Nina Amenta. 2009. Real-time parallel hashing on the GPU. ACM Transactions on Graphics (TOG) 28, 5 (2009), 154
work page 2009
-
[3]
Austin Appleby. 2008. Murmurhash 2.0. (2008)
work page 2008
-
[4]
Saman Ashkiani, Andrew Davidson, Ulrich Meyer, and John D Owens. 2016. GPU Multisplit. In ACM SIGPLAN Notices, Vol. 51. ACM, 12
work page 2016
-
[5]
Saman Ashkiani, Martin Farach-Colton, and John D Owens. 2018. A dynamic hash table for the GPU. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 419–429
work page 2018
-
[6]
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Özsu. 2013. Multi- Core, Main-Memory Joins: Sort vs. Hash Revisited. Proceedings of the VLDB Endowment 7, 1 (2013), 85–96
work page 2013
-
[7]
Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M Tamer Özsu. 2013. Main- Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. In IEEE 29th International Conference on Data Engineering (ICDE) . IEEE, 362–373
work page 2013
-
[8]
Claude Barthels, Ingo Müller, Timo Schneider, Gustavo Alonso, and Torsten Hoefler. 2017. Distributed Join Algorithms on Thousands of Cores. Proceedings of the VLDB Endowment 10, 5 (2017), 517–528
work page 2017
-
[9]
Edward W Bethel, Luke J Gosink, Kesheng Wu, Edward Wes Bethel, John D Owens, and Kenneth I Joy. 2008. Bin-Hash Indexing: A Parallel Method for Fast Query Processing. Technical Report. Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States)
work page 2008
-
[10]
Mauro Bisson and Massimiliano Fatica. 2017. High Performance Exact Triangle Counting on GPUs. IEEE Transactions on Parallel and Distributed Systems 28, 12 (2017), 3501–3510
work page 2017
-
[11]
Spyros Blanas, Yinan Li, and Jignesh M Patel. 2011. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-Core CPUs. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data . ACM, 37–48
work page 2011
-
[12]
Guy E Blelloch. 1990. Prefix Sums and Their Applications . Technical Report. Citeseer
work page 1990
- [13]
-
[14]
O. Green and D.A. Bader. 2016. cuSTINGER: Supporting Dynamic Graph Algo- rithms for GPUS. In IEEE Proc. High Performance Extreme Computing (HPEC) . Waltham, MA
work page 2016
-
[15]
Mark Harris, John Owens, Shubho Sengupta, Yao Zhang, and Andrew Davidson
-
[16]
CUDPP: CUDA Data Parallel Primitives library. (2007)
work page 2007
-
[17]
Mark Harris, Shubhabrata Sengupta, and John D Owens. 2007. Parallel prefix sum (scan) with CUDA. GPU gems 3, 39 (2007), 851–876
work page 2007
-
[18]
Daniel Jünger, Christian Hundt, and Bertil Schmidt. 2018. WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) . IEEE, 441–450
work page 2018
-
[19]
Tomas Karnagel, Rene Mueller, and Guy M Lohman. 2015. Optimizing GPU- accelerated Group-By and Aggregation. ADMS at VLDB 8 (2015), 20
work page 2015
-
[20]
Farzad Khorasani, Mehmet E Belviranli, Rajiv Gupta, and Laxmi N Bhuyan
-
[21]
In International Conference on Parallel Architecture and Compilation (PACT)
Stadium Hashing: Scalable and Flexible Hashing on Gpus. In International Conference on Parallel Architecture and Compilation (PACT) . IEEE, 63–74
-
[22]
Changkyu Kim, Tim Kaldewey, Victor W Lee, Eric Sedlar, Anthony D Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. Proceedings of the VLDB Endowment 2, 2 (2009), 1378–1389
work page 2009
-
[23]
Ping Li, Anshumali Shrivastava, and Christian A Konig. 2012. GPU-based Minwise Hashing. In Proceedings of the 21st International Conference on World Wide Web . ACM, 565–566
work page 2012
-
[24]
Peter Macko, Virendra J Marathe, Daniel W Margo, and Margo I Seltzer. 2015. LLAMA: Efficient Graph Analytics Using Large Multiversioned Arrays. In 31st IEEE Int’l Conf. on Data Engineering (ICDE). 363–374
work page 2015
-
[25]
Tobias Maier, Peter Sanders, and Roman Dementiev. 2016. Concurrent Hash Tables: Fast and General?(!). In ACM SIGPLAN Notices, Vol. 51. ACM, 34
work page 2016
-
[26]
NVIDIA. 2018. CUDF. (2018)
work page 2018
-
[27]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo Hashing. Journal of Algorithms 51, 2 (2004), 122–144
work page 2004
-
[28]
Tony C Pan, Sanchit Misra, and Srinivas Aluru. 2018. Optimizing High Perfor- mance Distributed Memory Parallel Hash Tables for DNA k-mer Counting. In Optimizing High Performance Distributed Memory Parallel Hash Tables for DNA k-mer Counting. IEEE, 0
work page 2018
-
[29]
Andrea W Richa, M Mitzenmacher, and R Sitaraman. 2001. The Power of Two Ran- dom Choices: A Survey of Techniques and Results. Combinatorial Optimization 9 (2001), 255–304
work page 2001
-
[30]
Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. 2016. Graphin: An Online High Performance Incremental Graph Processing Framework. In European Conference on Parallel Processing. Springer, 319–333
work page 2016
-
[31]
Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D Owens. 2007. Scan Primitives for GPU Computing. In Graphics hardware. 97–106
work page 2007
-
[32]
Martin Winter, Rhaleb Zayer, and Markus Steinberger. 2017. Autonomous, Inde- pendent Management of Dynamic Graphs on GPUs. In International Supercom- puting Conference. Springer, 97–119. 11
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.