HashGraph -- Scalable Hash Tables Using A Sparse Graph Data Structure

Oded Green

arxiv: 1907.02900 · v1 · pith:GAKHL4WHnew · submitted 2019-07-05 · 💻 cs.DS · cs.DB· cs.DC

HashGraph -- Scalable Hash Tables Using A Sparse Graph Data Structure

Oded Green This is my paper

Pith reviewed 2026-05-25 01:51 UTC · model grok-4.3

classification 💻 cs.DS cs.DBcs.DC

keywords hash tablessparse graphscollision resolutionGPU computingstatic data structuresscalable indexingconstant-time lookups

0 comments

The pith

HashGraph builds hash tables as sparse graphs to handle collisions without chaining or open addressing while remaining fast at any load factor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HashGraph, a method that represents hash table collisions using a sparse graph data structure instead of traditional chaining or open addressing. It shows two variants: a basic construction algorithm and an optimized one with better memory access. For static inputs, the approach claims to maintain high performance regardless of the number of collisions per bucket or the overall load factor, achieving build rates of 2.5 billion keys per second and similar probe rates on GPUs while outperforming existing implementations.

Core claim

HashGraph models the mapping of keys to hash buckets as a sparse graph in which vertices correspond to unique hash values and edges encode collision relationships, supporting a dedicated probing algorithm that achieves constant-time lookups independent of collision multiplicity or load factor for static data.

What carries the argument

The sparse graph representation of hash collisions, where vertices stand for hash values and edges capture bucket assignments to enable efficient construction and graph-based probing.

If this is right

Hash table construction and probing can sustain rates above 2 billion keys per second on GPUs for static inputs.
Performance stays constant even when individual buckets hold large numbers of colliding keys.
The method works across a wide range of load factors without the slowdowns seen in other collision-resolution schemes.
Recent advances in dynamic graph structures could allow extension beyond the current static-input limitation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The graph encoding might enable new parallel query patterns that traverse collision edges for aggregate operations rather than single lookups.
Similar sparse-graph modeling could be tested on CPU architectures to check whether the memory-access improvements generalize beyond the reported GPU results.
Treating collisions as explicit edges suggests the possibility of applying graph compression techniques to further reduce memory use in high-collision regimes.

Load-bearing premise

A sparse graph of collisions can be constructed and traversed efficiently for arbitrary numbers of collisions per bucket while remaining independent of load factor.

What would settle it

Measure build and probe times while systematically increasing the average collisions per bucket from 1 to several hundred; constant times would support the claim, while proportional growth would falsify it.

read the original abstract

Hash tables are ubiquitous and used in a wide range of applications for efficient probing of large and unsorted data. If designed properly, hash-tables can enable efficients look ups in a constant number of operations or commonly referred to as O(1) operations. As data sizes continue to grow and data becomes less structured (as is common for big-data applications), the need for efficient and scalable hash table also grows. In this paper we introduce HashGraph, a new scalable approach for building hash tables that uses concepts taken from sparse graph representations--hence the name HashGraph. We show two different variants of HashGraph, a simple algorithm that outlines the method to create the hash-table and an advanced method that creates the hash table in a more efficient manner (with an improved memory access pattern). HashGraph shows a new way to deal with hash-collisions that does not use "open-addressing" or "chaining", yet has all the benefits of both these approaches. HashGraph currently works for static inputs, though recent progress with dynamic graph data structures suggest that HashGraph might be extended to dynamic inputs as well. We show that HashGraph can deal with a large number of hash-values per entry without loss of performance as most open-addressing and chaining approaches have. Further, we show that HashGraph is indifferent to the load-factor. Lastly, we show a new probing algorithm for the second phase of value lookups. Given the above, HashGraph is extremely fast and outperforms several state of the art hash-table implementations. The implementation of HashGraph in this paper is for NVIDIA GPUs, though HashGraph is not architecture dependent. Using a NVIDIA GV100 GPU, HashGraph is anywhere from 2X-8X faster than cuDPP, WarpDrive, and cuDF. HashGraph is able to build a hash-table at a rate of 2.5 billion keys per second and can probe at nearly the same rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HashGraph claims a graph-based collision method that sidesteps both chaining and open addressing while staying fast at high loads, but the key efficiency claims rest on an unshown sparse representation that must avoid degree-dependent costs.

read the letter

The paper's main contribution is framing hash collisions via a sparse graph structure instead of the usual two families, with two variants (simple and advanced) for static inputs on GPUs. It reports build rates around 2.5 billion keys per second and similar probe rates, plus indifference to load factor and the ability to handle many collisions per bucket without slowdown. That framing is distinct and the performance numbers, if they hold, would be relevant for GPU big-data workloads. The advanced variant's improved memory access pattern is presented as a practical plus. The work is honest about being limited to static cases for now. The soft spot is exactly the one flagged in the stress test: without explicit data layout, adjacency representation, or complexity statements, it is not obvious how the graph encoding keeps build and the new second-phase probe independent of bucket degree or load factor. Standard sparse formats like CSR would still pay linear costs on high-degree nodes, which would undercut the claimed advantages over chaining. The abstract supplies no pseudocode or analysis to resolve this, and the experiments are summarized at a high level. This is aimed at people building high-throughput hash tables for GPUs or similar architectures. It deserves a serious referee because the idea is new enough and the reported speeds high enough that a full version with the missing construction details and reproducible benchmarks would be worth checking, even if revisions are needed.

Referee Report

3 major / 2 minor

Summary. The paper introduces HashGraph, a hash-table construction technique based on sparse graph representations that claims to handle collisions without using open-addressing or chaining while retaining the benefits of both. It presents a simple algorithm and an advanced variant with improved memory access, asserts indifference to load factor and to the number of collisions per bucket, introduces a new second-phase probing method, and reports GPU performance of 2.5 billion keys per second for both build and probe, outperforming cuDPP, WarpDrive, and cuDF by 2-8X on an NVIDIA GV100. The work is restricted to static inputs.

Significance. If the algorithmic construction, complexity claims, and empirical results can be substantiated, HashGraph would constitute a new collision-resolution paradigm with potential advantages for high-load-factor or high-collision workloads in big-data settings; the reported build rates and architecture independence would be practically relevant.

major comments (3)

[Abstract] Abstract and throughout: the central claim that a sparse-graph representation yields build and probe times/memory independent of the number of collisions per bucket (i.e., node degree) and of load factor is asserted without any complexity statement, adjacency-list layout description, or pseudocode. Standard CSR or edge-list representations would incur linear dependence on degree, directly contradicting the claimed benefits over chaining.
[Abstract] Abstract: no construction algorithm, data-layout details, or analysis of the “advanced method” is supplied, so the asserted improvement in memory-access pattern and the new second-phase probe cannot be evaluated; these omissions are load-bearing for the performance and scalability claims.
[Abstract] Abstract: the reported 2.5 billion keys/sec build rate and 2-8X speedups are presented without experimental methodology, input distributions, tested load factors, collision multiplicities, or comparison baselines, rendering the empirical claims unverifiable.

minor comments (2)

[Abstract] Abstract contains the typo “efficients look ups”.
[Abstract] The statement that HashGraph “might be extended to dynamic inputs” references “recent progress with dynamic graph data structures” but supplies no citations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We agree that the abstract requires additional supporting details on algorithms, complexity, data layouts, and experimental methodology to substantiate the claims, and we will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract and throughout: the central claim that a sparse-graph representation yields build and probe times/memory independent of the number of collisions per bucket (i.e., node degree) and of load factor is asserted without any complexity statement, adjacency-list layout description, or pseudocode. Standard CSR or edge-list representations would incur linear dependence on degree, directly contradicting the claimed benefits over chaining.

Authors: We agree that the abstract asserts the independence claim without the requested supporting material. The manuscript body presents the algorithm, but we will add an explicit complexity analysis and pseudocode showing O(1) build/probe independent of degree and load factor. The representation is a custom sparse structure (not standard CSR) that pre-allocates offsets to avoid linear dependence on bucket degree; we will describe the layout and contrast it with chaining/CSR in the revision. revision: yes
Referee: [Abstract] Abstract: no construction algorithm, data-layout details, or analysis of the “advanced method” is supplied, so the asserted improvement in memory-access pattern and the new second-phase probe cannot be evaluated; these omissions are load-bearing for the performance and scalability claims.

Authors: The construction for the simple and advanced variants, along with data layout, appears in the manuscript body, but we acknowledge the need for more explicit analysis. We will add pseudocode for both variants, a description of the advanced method's reordered traversal for better locality, and details on the second-phase graph-based probe. These additions will be made in the revised version. revision: yes
Referee: [Abstract] Abstract: the reported 2.5 billion keys/sec build rate and 2-8X speedups are presented without experimental methodology, input distributions, tested load factors, collision multiplicities, or comparison baselines, rendering the empirical claims unverifiable.

Authors: The experimental section details the methodology, uniform random 64-bit key inputs, load factors from 0.1-0.95, collision multiplicities, and baselines (cuDPP, WarpDrive, cuDF). We will revise the abstract to reference this setup briefly and ensure all parameters are clearly stated upfront. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic description and empirical results with no derivations or fitted parameters

full rationale

The paper introduces HashGraph as a new algorithmic construction for hash tables based on sparse graph representations. It describes two variants (simple and advanced), reports performance metrics from GPU implementation, and states properties such as load-factor indifference and handling of high collision counts. No equations, first-principles derivations, parameter fitting, or self-citations appear as load-bearing steps in the provided text. All claims are presented as direct consequences of the algorithm design and benchmark measurements rather than reductions to prior inputs by construction. This is the expected non-finding for a purely algorithmic/empirical systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unelaborated premise that sparse graphs can encode hash tables with the stated benefits; no free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption Sparse graph representations can encode hash tables such that collision handling inherits benefits of both chaining and open-addressing without their drawbacks.
This premise is invoked to justify the new probing algorithm and load-factor indifference.

invented entities (1)

HashGraph no independent evidence
purpose: Hash table data structure realized via sparse graph encoding
Newly introduced method whose efficiency is asserted but not derived from prior literature in the abstract.

pith-pipeline@v0.9.0 · 5883 in / 1302 out tokens · 41456 ms · 2026-05-25T01:51:20.461996+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Albert, H

R. Albert, H. Jeong, and A.-L. Barabási. 1999. Internet: Diameter of the World- Wide Web. Nature 401 (Sept. 1999), 130–131

work page 1999
[2]

Dan A Alcantara, Andrei Sharf, Fatemeh Abbasinejad, Shubhabrata Sengupta, Michael Mitzenmacher, John D Owens, and Nina Amenta. 2009. Real-time parallel hashing on the GPU. ACM Transactions on Graphics (TOG) 28, 5 (2009), 154

work page 2009
[3]

Austin Appleby. 2008. Murmurhash 2.0. (2008)

work page 2008
[4]

Saman Ashkiani, Andrew Davidson, Ulrich Meyer, and John D Owens. 2016. GPU Multisplit. In ACM SIGPLAN Notices, Vol. 51. ACM, 12

work page 2016
[5]

Saman Ashkiani, Martin Farach-Colton, and John D Owens. 2018. A dynamic hash table for the GPU. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 419–429

work page 2018
[6]

Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Özsu. 2013. Multi- Core, Main-Memory Joins: Sort vs. Hash Revisited. Proceedings of the VLDB Endowment 7, 1 (2013), 85–96

work page 2013
[7]

Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M Tamer Özsu. 2013. Main- Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. In IEEE 29th International Conference on Data Engineering (ICDE) . IEEE, 362–373

work page 2013
[8]

Claude Barthels, Ingo Müller, Timo Schneider, Gustavo Alonso, and Torsten Hoefler. 2017. Distributed Join Algorithms on Thousands of Cores. Proceedings of the VLDB Endowment 10, 5 (2017), 517–528

work page 2017
[9]

Edward W Bethel, Luke J Gosink, Kesheng Wu, Edward Wes Bethel, John D Owens, and Kenneth I Joy. 2008. Bin-Hash Indexing: A Parallel Method for Fast Query Processing. Technical Report. Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States)

work page 2008
[10]

Mauro Bisson and Massimiliano Fatica. 2017. High Performance Exact Triangle Counting on GPUs. IEEE Transactions on Parallel and Distributed Systems 28, 12 (2017), 3501–3510

work page 2017
[11]

Spyros Blanas, Yinan Li, and Jignesh M Patel. 2011. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-Core CPUs. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data . ACM, 37–48

work page 2011
[12]

Guy E Blelloch. 1990. Prefix Sums and Their Applications . Technical Report. Citeseer

work page 1990
[13]

Busato, O

F. Busato, O. Green, N. Bombieri, and D.A. Bader. 2018. Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs. In IEEE Proc. High Performance Extreme Computing (HPEC) . Waltham, MA

work page 2018
[14]

Green and D.A

O. Green and D.A. Bader. 2016. cuSTINGER: Supporting Dynamic Graph Algo- rithms for GPUS. In IEEE Proc. High Performance Extreme Computing (HPEC) . Waltham, MA

work page 2016
[15]

Mark Harris, John Owens, Shubho Sengupta, Yao Zhang, and Andrew Davidson

work page
[16]

CUDPP: CUDA Data Parallel Primitives library. (2007)

work page 2007
[17]

Mark Harris, Shubhabrata Sengupta, and John D Owens. 2007. Parallel prefix sum (scan) with CUDA. GPU gems 3, 39 (2007), 851–876

work page 2007
[18]

Daniel Jünger, Christian Hundt, and Bertil Schmidt. 2018. WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) . IEEE, 441–450

work page 2018
[19]

Tomas Karnagel, Rene Mueller, and Guy M Lohman. 2015. Optimizing GPU- accelerated Group-By and Aggregation. ADMS at VLDB 8 (2015), 20

work page 2015
[20]

Farzad Khorasani, Mehmet E Belviranli, Rajiv Gupta, and Laxmi N Bhuyan

work page
[21]

In International Conference on Parallel Architecture and Compilation (PACT)

Stadium Hashing: Scalable and Flexible Hashing on Gpus. In International Conference on Parallel Architecture and Compilation (PACT) . IEEE, 63–74

work page
[22]

Changkyu Kim, Tim Kaldewey, Victor W Lee, Eric Sedlar, Anthony D Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. Proceedings of the VLDB Endowment 2, 2 (2009), 1378–1389

work page 2009
[23]

Ping Li, Anshumali Shrivastava, and Christian A Konig. 2012. GPU-based Minwise Hashing. In Proceedings of the 21st International Conference on World Wide Web . ACM, 565–566

work page 2012
[24]

Peter Macko, Virendra J Marathe, Daniel W Margo, and Margo I Seltzer. 2015. LLAMA: Efficient Graph Analytics Using Large Multiversioned Arrays. In 31st IEEE Int’l Conf. on Data Engineering (ICDE). 363–374

work page 2015
[25]

Tobias Maier, Peter Sanders, and Roman Dementiev. 2016. Concurrent Hash Tables: Fast and General?(!). In ACM SIGPLAN Notices, Vol. 51. ACM, 34

work page 2016
[26]

NVIDIA. 2018. CUDF. (2018)

work page 2018
[27]

Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo Hashing. Journal of Algorithms 51, 2 (2004), 122–144

work page 2004
[28]

Tony C Pan, Sanchit Misra, and Srinivas Aluru. 2018. Optimizing High Perfor- mance Distributed Memory Parallel Hash Tables for DNA k-mer Counting. In Optimizing High Performance Distributed Memory Parallel Hash Tables for DNA k-mer Counting. IEEE, 0

work page 2018
[29]

Andrea W Richa, M Mitzenmacher, and R Sitaraman. 2001. The Power of Two Ran- dom Choices: A Survey of Techniques and Results. Combinatorial Optimization 9 (2001), 255–304

work page 2001
[30]

Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. 2016. Graphin: An Online High Performance Incremental Graph Processing Framework. In European Conference on Parallel Processing. Springer, 319–333

work page 2016
[31]

Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D Owens. 2007. Scan Primitives for GPU Computing. In Graphics hardware. 97–106

work page 2007
[32]

Martin Winter, Rhaleb Zayer, and Markus Steinberger. 2017. Autonomous, Inde- pendent Management of Dynamic Graphs on GPUs. In International Supercom- puting Conference. Springer, 97–119. 11

work page 2017

[1] [1]

Albert, H

R. Albert, H. Jeong, and A.-L. Barabási. 1999. Internet: Diameter of the World- Wide Web. Nature 401 (Sept. 1999), 130–131

work page 1999

[2] [2]

Dan A Alcantara, Andrei Sharf, Fatemeh Abbasinejad, Shubhabrata Sengupta, Michael Mitzenmacher, John D Owens, and Nina Amenta. 2009. Real-time parallel hashing on the GPU. ACM Transactions on Graphics (TOG) 28, 5 (2009), 154

work page 2009

[3] [3]

Austin Appleby. 2008. Murmurhash 2.0. (2008)

work page 2008

[4] [4]

Saman Ashkiani, Andrew Davidson, Ulrich Meyer, and John D Owens. 2016. GPU Multisplit. In ACM SIGPLAN Notices, Vol. 51. ACM, 12

work page 2016

[5] [5]

Saman Ashkiani, Martin Farach-Colton, and John D Owens. 2018. A dynamic hash table for the GPU. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 419–429

work page 2018

[6] [6]

Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Özsu. 2013. Multi- Core, Main-Memory Joins: Sort vs. Hash Revisited. Proceedings of the VLDB Endowment 7, 1 (2013), 85–96

work page 2013

[7] [7]

Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M Tamer Özsu. 2013. Main- Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. In IEEE 29th International Conference on Data Engineering (ICDE) . IEEE, 362–373

work page 2013

[8] [8]

Claude Barthels, Ingo Müller, Timo Schneider, Gustavo Alonso, and Torsten Hoefler. 2017. Distributed Join Algorithms on Thousands of Cores. Proceedings of the VLDB Endowment 10, 5 (2017), 517–528

work page 2017

[9] [9]

Edward W Bethel, Luke J Gosink, Kesheng Wu, Edward Wes Bethel, John D Owens, and Kenneth I Joy. 2008. Bin-Hash Indexing: A Parallel Method for Fast Query Processing. Technical Report. Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States)

work page 2008

[10] [10]

Mauro Bisson and Massimiliano Fatica. 2017. High Performance Exact Triangle Counting on GPUs. IEEE Transactions on Parallel and Distributed Systems 28, 12 (2017), 3501–3510

work page 2017

[11] [11]

Spyros Blanas, Yinan Li, and Jignesh M Patel. 2011. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-Core CPUs. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data . ACM, 37–48

work page 2011

[12] [12]

Guy E Blelloch. 1990. Prefix Sums and Their Applications . Technical Report. Citeseer

work page 1990

[13] [13]

Busato, O

F. Busato, O. Green, N. Bombieri, and D.A. Bader. 2018. Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs. In IEEE Proc. High Performance Extreme Computing (HPEC) . Waltham, MA

work page 2018

[14] [14]

Green and D.A

O. Green and D.A. Bader. 2016. cuSTINGER: Supporting Dynamic Graph Algo- rithms for GPUS. In IEEE Proc. High Performance Extreme Computing (HPEC) . Waltham, MA

work page 2016

[15] [15]

Mark Harris, John Owens, Shubho Sengupta, Yao Zhang, and Andrew Davidson

work page

[16] [16]

CUDPP: CUDA Data Parallel Primitives library. (2007)

work page 2007

[17] [17]

Mark Harris, Shubhabrata Sengupta, and John D Owens. 2007. Parallel prefix sum (scan) with CUDA. GPU gems 3, 39 (2007), 851–876

work page 2007

[18] [18]

Daniel Jünger, Christian Hundt, and Bertil Schmidt. 2018. WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) . IEEE, 441–450

work page 2018

[19] [19]

Tomas Karnagel, Rene Mueller, and Guy M Lohman. 2015. Optimizing GPU- accelerated Group-By and Aggregation. ADMS at VLDB 8 (2015), 20

work page 2015

[20] [20]

Farzad Khorasani, Mehmet E Belviranli, Rajiv Gupta, and Laxmi N Bhuyan

work page

[21] [21]

In International Conference on Parallel Architecture and Compilation (PACT)

Stadium Hashing: Scalable and Flexible Hashing on Gpus. In International Conference on Parallel Architecture and Compilation (PACT) . IEEE, 63–74

work page

[22] [22]

Changkyu Kim, Tim Kaldewey, Victor W Lee, Eric Sedlar, Anthony D Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. Proceedings of the VLDB Endowment 2, 2 (2009), 1378–1389

work page 2009

[23] [23]

Ping Li, Anshumali Shrivastava, and Christian A Konig. 2012. GPU-based Minwise Hashing. In Proceedings of the 21st International Conference on World Wide Web . ACM, 565–566

work page 2012

[24] [24]

Peter Macko, Virendra J Marathe, Daniel W Margo, and Margo I Seltzer. 2015. LLAMA: Efficient Graph Analytics Using Large Multiversioned Arrays. In 31st IEEE Int’l Conf. on Data Engineering (ICDE). 363–374

work page 2015

[25] [25]

Tobias Maier, Peter Sanders, and Roman Dementiev. 2016. Concurrent Hash Tables: Fast and General?(!). In ACM SIGPLAN Notices, Vol. 51. ACM, 34

work page 2016

[26] [26]

NVIDIA. 2018. CUDF. (2018)

work page 2018

[27] [27]

Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo Hashing. Journal of Algorithms 51, 2 (2004), 122–144

work page 2004

[28] [28]

Tony C Pan, Sanchit Misra, and Srinivas Aluru. 2018. Optimizing High Perfor- mance Distributed Memory Parallel Hash Tables for DNA k-mer Counting. In Optimizing High Performance Distributed Memory Parallel Hash Tables for DNA k-mer Counting. IEEE, 0

work page 2018

[29] [29]

Andrea W Richa, M Mitzenmacher, and R Sitaraman. 2001. The Power of Two Ran- dom Choices: A Survey of Techniques and Results. Combinatorial Optimization 9 (2001), 255–304

work page 2001

[30] [30]

Dipanjan Sengupta, Narayanan Sundaram, Xia Zhu, Theodore L Willke, Jeffrey Young, Matthew Wolf, and Karsten Schwan. 2016. Graphin: An Online High Performance Incremental Graph Processing Framework. In European Conference on Parallel Processing. Springer, 319–333

work page 2016

[31] [31]

Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D Owens. 2007. Scan Primitives for GPU Computing. In Graphics hardware. 97–106

work page 2007

[32] [32]

Martin Winter, Rhaleb Zayer, and Markus Steinberger. 2017. Autonomous, Inde- pendent Management of Dynamic Graphs on GPUs. In International Supercom- puting Conference. Springer, 97–119. 11

work page 2017