pith. sign in

arxiv: 1907.02900 · v1 · pith:GAKHL4WHnew · submitted 2019-07-05 · 💻 cs.DS · cs.DB· cs.DC

HashGraph -- Scalable Hash Tables Using A Sparse Graph Data Structure

Pith reviewed 2026-05-25 01:51 UTC · model grok-4.3

classification 💻 cs.DS cs.DBcs.DC
keywords hash tablessparse graphscollision resolutionGPU computingstatic data structuresscalable indexingconstant-time lookups
0
0 comments X

The pith

HashGraph builds hash tables as sparse graphs to handle collisions without chaining or open addressing while remaining fast at any load factor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HashGraph, a method that represents hash table collisions using a sparse graph data structure instead of traditional chaining or open addressing. It shows two variants: a basic construction algorithm and an optimized one with better memory access. For static inputs, the approach claims to maintain high performance regardless of the number of collisions per bucket or the overall load factor, achieving build rates of 2.5 billion keys per second and similar probe rates on GPUs while outperforming existing implementations.

Core claim

HashGraph models the mapping of keys to hash buckets as a sparse graph in which vertices correspond to unique hash values and edges encode collision relationships, supporting a dedicated probing algorithm that achieves constant-time lookups independent of collision multiplicity or load factor for static data.

What carries the argument

The sparse graph representation of hash collisions, where vertices stand for hash values and edges capture bucket assignments to enable efficient construction and graph-based probing.

If this is right

  • Hash table construction and probing can sustain rates above 2 billion keys per second on GPUs for static inputs.
  • Performance stays constant even when individual buckets hold large numbers of colliding keys.
  • The method works across a wide range of load factors without the slowdowns seen in other collision-resolution schemes.
  • Recent advances in dynamic graph structures could allow extension beyond the current static-input limitation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The graph encoding might enable new parallel query patterns that traverse collision edges for aggregate operations rather than single lookups.
  • Similar sparse-graph modeling could be tested on CPU architectures to check whether the memory-access improvements generalize beyond the reported GPU results.
  • Treating collisions as explicit edges suggests the possibility of applying graph compression techniques to further reduce memory use in high-collision regimes.

Load-bearing premise

A sparse graph of collisions can be constructed and traversed efficiently for arbitrary numbers of collisions per bucket while remaining independent of load factor.

What would settle it

Measure build and probe times while systematically increasing the average collisions per bucket from 1 to several hundred; constant times would support the claim, while proportional growth would falsify it.

read the original abstract

Hash tables are ubiquitous and used in a wide range of applications for efficient probing of large and unsorted data. If designed properly, hash-tables can enable efficients look ups in a constant number of operations or commonly referred to as O(1) operations. As data sizes continue to grow and data becomes less structured (as is common for big-data applications), the need for efficient and scalable hash table also grows. In this paper we introduce HashGraph, a new scalable approach for building hash tables that uses concepts taken from sparse graph representations--hence the name HashGraph. We show two different variants of HashGraph, a simple algorithm that outlines the method to create the hash-table and an advanced method that creates the hash table in a more efficient manner (with an improved memory access pattern). HashGraph shows a new way to deal with hash-collisions that does not use "open-addressing" or "chaining", yet has all the benefits of both these approaches. HashGraph currently works for static inputs, though recent progress with dynamic graph data structures suggest that HashGraph might be extended to dynamic inputs as well. We show that HashGraph can deal with a large number of hash-values per entry without loss of performance as most open-addressing and chaining approaches have. Further, we show that HashGraph is indifferent to the load-factor. Lastly, we show a new probing algorithm for the second phase of value lookups. Given the above, HashGraph is extremely fast and outperforms several state of the art hash-table implementations. The implementation of HashGraph in this paper is for NVIDIA GPUs, though HashGraph is not architecture dependent. Using a NVIDIA GV100 GPU, HashGraph is anywhere from 2X-8X faster than cuDPP, WarpDrive, and cuDF. HashGraph is able to build a hash-table at a rate of 2.5 billion keys per second and can probe at nearly the same rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces HashGraph, a hash-table construction technique based on sparse graph representations that claims to handle collisions without using open-addressing or chaining while retaining the benefits of both. It presents a simple algorithm and an advanced variant with improved memory access, asserts indifference to load factor and to the number of collisions per bucket, introduces a new second-phase probing method, and reports GPU performance of 2.5 billion keys per second for both build and probe, outperforming cuDPP, WarpDrive, and cuDF by 2-8X on an NVIDIA GV100. The work is restricted to static inputs.

Significance. If the algorithmic construction, complexity claims, and empirical results can be substantiated, HashGraph would constitute a new collision-resolution paradigm with potential advantages for high-load-factor or high-collision workloads in big-data settings; the reported build rates and architecture independence would be practically relevant.

major comments (3)
  1. [Abstract] Abstract and throughout: the central claim that a sparse-graph representation yields build and probe times/memory independent of the number of collisions per bucket (i.e., node degree) and of load factor is asserted without any complexity statement, adjacency-list layout description, or pseudocode. Standard CSR or edge-list representations would incur linear dependence on degree, directly contradicting the claimed benefits over chaining.
  2. [Abstract] Abstract: no construction algorithm, data-layout details, or analysis of the “advanced method” is supplied, so the asserted improvement in memory-access pattern and the new second-phase probe cannot be evaluated; these omissions are load-bearing for the performance and scalability claims.
  3. [Abstract] Abstract: the reported 2.5 billion keys/sec build rate and 2-8X speedups are presented without experimental methodology, input distributions, tested load factors, collision multiplicities, or comparison baselines, rendering the empirical claims unverifiable.
minor comments (2)
  1. [Abstract] Abstract contains the typo “efficients look ups”.
  2. [Abstract] The statement that HashGraph “might be extended to dynamic inputs” references “recent progress with dynamic graph data structures” but supplies no citations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We agree that the abstract requires additional supporting details on algorithms, complexity, data layouts, and experimental methodology to substantiate the claims, and we will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract and throughout: the central claim that a sparse-graph representation yields build and probe times/memory independent of the number of collisions per bucket (i.e., node degree) and of load factor is asserted without any complexity statement, adjacency-list layout description, or pseudocode. Standard CSR or edge-list representations would incur linear dependence on degree, directly contradicting the claimed benefits over chaining.

    Authors: We agree that the abstract asserts the independence claim without the requested supporting material. The manuscript body presents the algorithm, but we will add an explicit complexity analysis and pseudocode showing O(1) build/probe independent of degree and load factor. The representation is a custom sparse structure (not standard CSR) that pre-allocates offsets to avoid linear dependence on bucket degree; we will describe the layout and contrast it with chaining/CSR in the revision. revision: yes

  2. Referee: [Abstract] Abstract: no construction algorithm, data-layout details, or analysis of the “advanced method” is supplied, so the asserted improvement in memory-access pattern and the new second-phase probe cannot be evaluated; these omissions are load-bearing for the performance and scalability claims.

    Authors: The construction for the simple and advanced variants, along with data layout, appears in the manuscript body, but we acknowledge the need for more explicit analysis. We will add pseudocode for both variants, a description of the advanced method's reordered traversal for better locality, and details on the second-phase graph-based probe. These additions will be made in the revised version. revision: yes

  3. Referee: [Abstract] Abstract: the reported 2.5 billion keys/sec build rate and 2-8X speedups are presented without experimental methodology, input distributions, tested load factors, collision multiplicities, or comparison baselines, rendering the empirical claims unverifiable.

    Authors: The experimental section details the methodology, uniform random 64-bit key inputs, load factors from 0.1-0.95, collision multiplicities, and baselines (cuDPP, WarpDrive, cuDF). We will revise the abstract to reference this setup briefly and ensure all parameters are clearly stated upfront. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic description and empirical results with no derivations or fitted parameters

full rationale

The paper introduces HashGraph as a new algorithmic construction for hash tables based on sparse graph representations. It describes two variants (simple and advanced), reports performance metrics from GPU implementation, and states properties such as load-factor indifference and handling of high collision counts. No equations, first-principles derivations, parameter fitting, or self-citations appear as load-bearing steps in the provided text. All claims are presented as direct consequences of the algorithm design and benchmark measurements rather than reductions to prior inputs by construction. This is the expected non-finding for a purely algorithmic/empirical systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unelaborated premise that sparse graphs can encode hash tables with the stated benefits; no free parameters or invented physical entities are mentioned.

axioms (1)
  • domain assumption Sparse graph representations can encode hash tables such that collision handling inherits benefits of both chaining and open-addressing without their drawbacks.
    This premise is invoked to justify the new probing algorithm and load-factor indifference.
invented entities (1)
  • HashGraph no independent evidence
    purpose: Hash table data structure realized via sparse graph encoding
    Newly introduced method whose efficiency is asserted but not derived from prior literature in the abstract.

pith-pipeline@v0.9.0 · 5883 in / 1302 out tokens · 41456 ms · 2026-05-25T01:51:20.461996+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Albert, H

    R. Albert, H. Jeong, and A.-L. Barabási. 1999. Internet: Diameter of the World- Wide Web. Nature 401 (Sept. 1999), 130–131

  2. [2]

    Dan A Alcantara, Andrei Sharf, Fatemeh Abbasinejad, Shubhabrata Sengupta, Michael Mitzenmacher, John D Owens, and Nina Amenta. 2009. Real-time parallel hashing on the GPU. ACM Transactions on Graphics (TOG) 28, 5 (2009), 154

  3. [3]

    Austin Appleby. 2008. Murmurhash 2.0. (2008)

  4. [4]

    Saman Ashkiani, Andrew Davidson, Ulrich Meyer, and John D Owens. 2016. GPU Multisplit. In ACM SIGPLAN Notices, Vol. 51. ACM, 12

  5. [5]

    Saman Ashkiani, Martin Farach-Colton, and John D Owens. 2018. A dynamic hash table for the GPU. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 419–429

  6. [6]

    Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Özsu. 2013. Multi- Core, Main-Memory Joins: Sort vs. Hash Revisited. Proceedings of the VLDB Endowment 7, 1 (2013), 85–96

  7. [7]

    Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M Tamer Özsu. 2013. Main- Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. In IEEE 29th International Conference on Data Engineering (ICDE) . IEEE, 362–373

  8. [8]

    Claude Barthels, Ingo Müller, Timo Schneider, Gustavo Alonso, and Torsten Hoefler. 2017. Distributed Join Algorithms on Thousands of Cores. Proceedings of the VLDB Endowment 10, 5 (2017), 517–528

  9. [9]

    Edward W Bethel, Luke J Gosink, Kesheng Wu, Edward Wes Bethel, John D Owens, and Kenneth I Joy. 2008. Bin-Hash Indexing: A Parallel Method for Fast Query Processing. Technical Report. Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States)

  10. [10]

    Mauro Bisson and Massimiliano Fatica. 2017. High Performance Exact Triangle Counting on GPUs. IEEE Transactions on Parallel and Distributed Systems 28, 12 (2017), 3501–3510

  11. [11]

    Spyros Blanas, Yinan Li, and Jignesh M Patel. 2011. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-Core CPUs. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data . ACM, 37–48

  12. [12]

    Guy E Blelloch. 1990. Prefix Sums and Their Applications . Technical Report. Citeseer

  13. [13]

    Busato, O

    F. Busato, O. Green, N. Bombieri, and D.A. Bader. 2018. Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs. In IEEE Proc. High Performance Extreme Computing (HPEC) . Waltham, MA

  14. [14]

    Green and D.A

    O. Green and D.A. Bader. 2016. cuSTINGER: Supporting Dynamic Graph Algo- rithms for GPUS. In IEEE Proc. High Performance Extreme Computing (HPEC) . Waltham, MA

  15. [15]

    Mark Harris, John Owens, Shubho Sengupta, Yao Zhang, and Andrew Davidson

  16. [16]

    CUDPP: CUDA Data Parallel Primitives library. (2007)

  17. [17]

    Mark Harris, Shubhabrata Sengupta, and John D Owens. 2007. Parallel prefix sum (scan) with CUDA. GPU gems 3, 39 (2007), 851–876

  18. [18]

    Daniel Jünger, Christian Hundt, and Bertil Schmidt. 2018. WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) . IEEE, 441–450

  19. [19]

    Tomas Karnagel, Rene Mueller, and Guy M Lohman. 2015. Optimizing GPU- accelerated Group-By and Aggregation. ADMS at VLDB 8 (2015), 20

  20. [20]

    Farzad Khorasani, Mehmet E Belviranli, Rajiv Gupta, and Laxmi N Bhuyan

  21. [21]

    In International Conference on Parallel Architecture and Compilation (PACT)

    Stadium Hashing: Scalable and Flexible Hashing on Gpus. In International Conference on Parallel Architecture and Compilation (PACT) . IEEE, 63–74

  22. [22]

    Changkyu Kim, Tim Kaldewey, Victor W Lee, Eric Sedlar, Anthony D Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. Proceedings of the VLDB Endowment 2, 2 (2009), 1378–1389

  23. [23]

    Ping Li, Anshumali Shrivastava, and Christian A Konig. 2012. GPU-based Minwise Hashing. In Proceedings of the 21st International Conference on World Wide Web . ACM, 565–566

  24. [24]

    Peter Macko, Virendra J Marathe, Daniel W Margo, and Margo I Seltzer. 2015. LLAMA: Efficient Graph Analytics Using Large Multiversioned Arrays. In 31st IEEE Int’l Conf. on Data Engineering (ICDE). 363–374

  25. [25]

    Tobias Maier, Peter Sanders, and Roman Dementiev. 2016. Concurrent Hash Tables: Fast and General?(!). In ACM SIGPLAN Notices, Vol. 51. ACM, 34

  26. [26]

    NVIDIA. 2018. CUDF. (2018)

  27. [27]

    Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo Hashing. Journal of Algorithms 51, 2 (2004), 122–144

  28. [28]

    Tony C Pan, Sanchit Misra, and Srinivas Aluru. 2018. Optimizing High Perfor- mance Distributed Memory Parallel Hash Tables for DNA k-mer Counting. In Optimizing High Performance Distributed Memory Parallel Hash Tables for DNA k-mer Counting. IEEE, 0

  29. [29]

    Andrea W Richa, M Mitzenmacher, and R Sitaraman. 2001. The Power of Two Ran- dom Choices: A Survey of Techniques and Results. Combinatorial Optimization 9 (2001), 255–304

  30. [30]

    Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. 2016. Graphin: An Online High Performance Incremental Graph Processing Framework. In European Conference on Parallel Processing. Springer, 319–333

  31. [31]

    Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D Owens. 2007. Scan Primitives for GPU Computing. In Graphics hardware. 97–106

  32. [32]

    Martin Winter, Rhaleb Zayer, and Markus Steinberger. 2017. Autonomous, Inde- pendent Management of Dynamic Graphs on GPUs. In International Supercom- puting Conference. Springer, 97–119. 11