pith. sign in

arxiv: 1906.09430 · v1 · pith:ZRCSB2HUnew · submitted 2019-06-22 · 💻 cs.DS

Algorithms for Similarity Search and Pseudorandomness

Pith reviewed 2026-05-25 18:11 UTC · model grok-4.3

classification 💻 cs.DS
keywords approximate near neighbor searchlocality-sensitive hashingspace-time tradeoffspseudorandom numberssimilarity joinslower boundshash functions
0
0 comments X

The pith

An improved locality-sensitive hashing framework solves approximate near neighbor search with fewer hash evaluations and lower word-RAM complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an improved framework for the approximate near neighbor problem that uses locality-sensitive hashing. This framework reduces the number of hash function evaluations and the word-RAM complexity relative to the standard framework. It also supplies frameworks for space-time tradeoffs with tight upper and lower bounds under cosine similarity and a novel method for search over sets that comes with a matching lower bound. Separate constructions give deterministic algorithms for high-quality pseudorandom numbers in constant time with near-optimal space and randomized hash families that nearly match cell-probe lower bounds. These advances matter because they make similarity search in large data sets more efficient and strengthen the guarantees available for pseudorandom generation.

Core claim

The author claims that an improved locality-sensitive hashing framework reduces the number of evaluations of locality-sensitive hash functions and the word-RAM complexity for solving the approximate near neighbor problem, provides space-time tradeoffs with tight upper and lower bounds for cosine similarity, offers a novel approach for the problem on sets with a matching lower bound, shows optimality of a Boolean scheme, gives tight lower bounds for asymmetric hashing, and supplies deterministic and randomized methods for high-quality pseudorandom number generation with near-optimal space and time.

What carries the argument

The improved framework for the approximate near neighbor problem that uses locality-sensitive hashing to reduce hash evaluations and word-RAM complexity.

If this is right

  • The number of locality-sensitive hash function evaluations drops compared to the standard framework.
  • Word-RAM complexity improves for approximate near neighbor solutions.
  • Tight upper and lower bounds characterize space-time tradeoffs under cosine similarity.
  • A novel approach improves performance for search on sets while matching a lower bound.
  • Deterministic algorithms generate high-quality pseudorandom numbers in constant time with near-optimal space.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practical retrieval systems could adopt the self-tuning variant to improve performance without manual tuning of parameters.
  • The matching lower bounds suggest that further asymptotic gains for set-based search are unlikely under the stated model.
  • The pseudorandom constructions could strengthen randomized algorithms that need high-quality bits while staying within tight space limits.

Load-bearing premise

Locality-sensitive hash families exist that preserve the required similarities with the probabilities stated in the standard definitions.

What would settle it

A data distribution or similarity measure where the new framework requires the same number of hash evaluations as the standard framework would show that the claimed reduction does not hold.

Figures

Figures reproduced from arXiv: 1906.09430 by Tobias Christiani.

Figure 1.1
Figure 1.1. Figure 1.1: The gap in the ρ-value between the best known upper and lower bounds for families of ((1 − α)d/2,(1 − β)d/2, p1, p2)-sensitive hash functions in d-dimensional Hamming space. 1.1.4 Beyond locality-sensitive hashing A common theme among recent advances in the area of theoretical approximate similarity search has been to move beyond standard locality￾sensitive hashing [14, 100, 149, 24, 16, 54, 56, 22]. The… view at source ↗
Figure 2.1
Figure 2.1. Figure 2.1: Upper bounds on the number of locality-sensitive hash functions from a [PITH_FULL_IMAGE:figures/full_fig_p036_2_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: reveals that the number of hash functions used by the Indyk [PITH_FULL_IMAGE:figures/full_fig_p036_2.png] view at source ↗
Figure 2.2
Figure 2.2. Figure 2.2: The number of locality-sensitive hash functions from a [PITH_FULL_IMAGE:figures/full_fig_p047_2_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: where we have set [PITH_FULL_IMAGE:figures/full_fig_p047_2.png] view at source ↗
Figure 2.3
Figure 2.3. Figure 2.3: The number of locality-sensitive hash functions from a [PITH_FULL_IMAGE:figures/full_fig_p048_2_3.png] view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: Chosen Path uses a branching process to associate each vector [PITH_FULL_IMAGE:figures/full_fig_p093_4_1.png] view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: Exponent when searching for a vector with Jaccard similarity [PITH_FULL_IMAGE:figures/full_fig_p098_4_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: ). The details of the comparisons are given in Appendix 4.8. [PITH_FULL_IMAGE:figures/full_fig_p107_4.png] view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: The difference ρ − ρminhash comparing Chosen Path and MinHash in terms of Braun-Blanquet similarities 0 < b2 < b1 < 1. of Theorem 4.1. In Appendix 4.8 we show that ρ < ρdatadep whenever b2 ≤ 1/5, or equivalently, whenever j2 ≤ 1/9. Revisiting the above example, for j1 = 0.2 and j2 = 0.1 we have ρdatadep = 0.6875 which is about 0.043 more than Chosen Path [PITH_FULL_IMAGE:figures/full_fig_p108_4_3.png] view at source ↗
Figure 4.4
Figure 4.4. Figure 4.4: The difference ρ − ρangular comparing Chosen Path and angular LSH in terms of Braun￾Blanquet similarities 0 < b2 < b1 < 1. space by O’Donnell et al. [121] which we use, through a reduction, to show LSH lower bounds under Braun-Blanquet similarity. Lower bounds for locality-sensitive maps. Because our upper bound is based on a locality-sensitive map MB and not LSH-based we first show that LSH lower bounds… view at source ↗
Figure 4.5
Figure 4.5. Figure 4.5: The difference ρ − ρdatadep comparing Chosen Path and data-dependent LSH in terms of Braun-Blanquet similarities 0 < b2 < b1 < 1. In the area of the parameter space that is colored blue we have that ρ ≤ ρdatadep while for the red area it holds that ρ > ρdatadep. Lemma 4.4. Suppose we have a (s1,s2, m1, m2)-sensitive family of maps M. Then we can construct a (s1,s2, p1, p2)-sensitive family of hash functi… view at source ↗
Figure 4.6
Figure 4.6. Figure 4.6: Solution with lowest ρ-value for the (j1, j2)-approximate Jaccard similarity search problem for different values of β. Blue is angular LSH. Green is MinHash. Red is Chosen Path. Note the difference in the axes for different values of β as it must hold that 0 ≤ j2 ≤ j1 ≤ β [PITH_FULL_IMAGE:figures/full_fig_p121_4_6.png] view at source ↗
Figure 5.1
Figure 5.1. Figure 5.1: Path termination depth in the Chosen Path Tree [PITH_FULL_IMAGE:figures/full_fig_p142_5_1.png] view at source ↗
Figure 5
Figure 5. Figure 5: shows the join time speedup that [PITH_FULL_IMAGE:figures/full_fig_p150_5.png] view at source ↗
Figure 5.2
Figure 5.2. Figure 5.2: Join time of CPSJoin with at least 90% recall relative to AllPairs. on the presence of rare tokens in the data to perform well. This difference is showcased with the synthetic TOKEN data sets. BayesLSH. The poor performance of BayesLSH compared to the other algorithms (BayesLSH was always slower) can most likely be tracked down to differences in the implementation of the candidate generation methods of B… view at source ↗
Figure 5
Figure 5. Figure 5: (a) shows the effect of the brute force limit on the join time. [PITH_FULL_IMAGE:figures/full_fig_p153_5.png] view at source ↗
Figure 5.3
Figure 5.3. Figure 5.3: Relative join time for CPSJoin with at least [PITH_FULL_IMAGE:figures/full_fig_p154_5_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: (c) shows the effect of sketch length on the join time. There [PITH_FULL_IMAGE:figures/full_fig_p155_5.png] view at source ↗
read the original abstract

We study the problem of approximate near neighbor (ANN) search and show the following results: - An improved framework for solving the ANN problem using locality-sensitive hashing, reducing the number of evaluations of locality-sensitive hash functions and the word-RAM complexity compared to the standard framework. - A framework for solving the ANN problem with space-time tradeoffs as well as tight upper and lower bounds for the space-time tradeoff of framework solutions to the ANN problem under cosine similarity. - A novel approach to solving the ANN problem on sets along with a matching lower bound, improving the state of the art. - A self-tuning version of the algorithm is shown through experiments to outperform existing similarity join algorithms. - Tight lower bounds for asymmetric locality-sensitive hashing which has applications to the approximate furthest neighbor problem, orthogonal vector search, and annulus queries. - A proof of the optimality of a well-known Boolean locality-sensitive hashing scheme. We study the problem of efficient algorithms for producing high-quality pseudorandom numbers and obtain the following results: - A deterministic algorithm for generating pseudorandom numbers of arbitrarily high quality in constant time using near-optimal space. - A randomized construction of a family of hash functions that outputs pseudorandom numbers of arbitrarily high quality with space usage and running time nearly matching known cell-probe lower bounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper studies approximate near neighbor (ANN) search and pseudorandomness. For ANN it claims an improved LSH framework reducing hash evaluations and word-RAM complexity versus the standard framework; a space-time tradeoff framework with tight upper/lower bounds under cosine similarity; a novel set-similarity approach with matching lower bound improving the state of the art; experimental evidence that a self-tuning variant outperforms existing similarity-join algorithms; tight lower bounds for asymmetric LSH (with applications to furthest-neighbor, orthogonal-vector, and annulus queries); and a proof that a well-known Boolean LSH scheme is optimal. For pseudorandomness it claims a deterministic constant-time generator of arbitrarily high quality using near-optimal space and a randomized hash-family construction whose space and time nearly match known cell-probe lower bounds.

Significance. If the stated reductions, tradeoffs, and matching bounds hold, the work would strengthen the theoretical foundations of LSH-based ANN and provide near-optimal pseudorandom constructions. The explicit matching of upper and lower bounds for the cosine-similarity space-time tradeoff and the set-similarity result are particularly valuable; the experimental self-tuning claim is independent of the theoretical contributions.

minor comments (2)
  1. The abstract states that the self-tuning algorithm 'outperforms existing similarity join algorithms' via experiments, but the manuscript should clarify the precise baselines, data sets, and metrics used so that the experimental claim can be evaluated independently of the theoretical sections.
  2. Notation for the improved LSH framework (number of hash evaluations, word-RAM cost) is introduced in the abstract without cross-reference to the section that defines the standard framework; adding an explicit pointer would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment, detailed summary of our contributions, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; theoretical extensions are self-contained

full rationale

The paper presents algorithmic improvements to LSH-based ANN search, space-time tradeoffs with matching upper/lower bounds, a novel set-based approach, and pseudorandomness constructions. These are framed as new analyses, frameworks, and proofs extending standard LSH assumptions rather than redefining quantities in terms of themselves or calling fitted parameters predictions. No load-bearing self-citations or ansatzes are indicated in the abstract or claimed results; the work is self-contained against external benchmarks like prior LSH families and cell-probe lower bounds.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; all such details reside in the full paper.

pith-pipeline@v0.9.0 · 5757 in / 1095 out tokens · 31053 ms · 2026-05-25T18:11:27.800787+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

176 extracted references · 176 canonical work pages · 7 internal anchors

  1. [1]

    ACM paris kanellakis theory and practice award

    ACM. ACM paris kanellakis theory and practice award. https: //awards.acm.org/award_winners/charikar_0308379,

  2. [2]

    [Online; accessed 26-April-2018]

  3. [3]

    Aggarwal, D

    C. Aggarwal, D. A. Keim, and A. Hinneburg. On the surprising behaviour of distance metrics in high dimensional space. In Proc. ICDT ’01, pages 420–434, 2001

  4. [4]

    A. Agresti. Bounds on the extinction time distribution of a branch- ing process. Advances in Applied Probability, 6(2):322–335, 1974

  5. [5]

    T. D. Ahle, M. Aumüller, and R. Pagh. Parameter-free locality sensitive hashing for spherical range reporting. In Proc. SODA ’17, pages 239–256, 2017

  6. [6]

    T. D. Ahle, R. Pagh, I. P . Razenshteyn, and F. Silvestri. On the complexity of inner product similarity join. In Proc. PODS’16 , pages 151–164, 2016

  7. [7]

    Alman and R

    J. Alman and R. Williams. Probabilistic polynomials and hamming nearest neighbors. In Proc. FOCS ’15, pages 136–150, 2015

  8. [8]

    Alon and N

    N. Alon and N. Asaf. k-wise independent random graphs. In Proc. FOCS ’08, pages 813–822, 2008

  9. [9]

    N. Alon, O. Goldreich, J. Håstad, and R. Peralta. Simple construc- tions of almost k-wise independent random variables. Random Structures & Algorithms, 3(3):289–304, 1992

  10. [10]

    A. Andoni. Nearest neighbor search: the old, the new, and the impossible. PhD thesis, MIT, 2009. 220 Bibliography

  11. [11]

    Andoni and P

    A. Andoni and P . Indyk. Efficient algorithms for substring near neighbor problem. In Proc. SODA ’06, pages 1203–1212, 2006

  12. [12]

    Andoni and P

    A. Andoni and P . Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proc. FOCS ’06, pages 459–468, 2006

  13. [13]

    Andoni and P

    A. Andoni and P . Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117–122, 2008

  14. [14]

    Andoni, P

    A. Andoni, P . Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt. Practical and optimal lsh for angular distance. In Proc. NIPS ’15, pages 1225–1233, 2015

  15. [15]

    Andoni, P

    A. Andoni, P . Indyk, H. L. Nguyen, and I. P . Razenshteyn. Beyond locality-sensitive hashing. In Proc. SODA ’14 , pages 1018–1028, 2014

  16. [16]

    Andoni, P

    A. Andoni, P . Indyk, and M. Patrascu. On the optimality of the dimensionality reduction method. In Proc. FOCS ’06, pages 449–458, 2006

  17. [17]

    Andoni, T

    A. Andoni, T. Laarhoven, I. P . Razenshteyn, and E. Waingarten. Optimal hashing-based time-space trade-offs for approximate near neighbors. In Proc. SODA ’17, pages 47–66, 2017

  18. [18]

    Andoni and I

    A. Andoni and I. Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proc. STOC ’15, pages 793–801, 2015

  19. [19]

    Andoni, I

    A. Andoni, I. P . Razenshteyn, and N. S. Nosatzki. LSH forest: Practical algorithms made theoretical. In Proc. SODA ’17, pages 67–78, 2017

  20. [20]

    Andoni and I

    A. Andoni and I. Razensteyn. Tight lower bounds for data- dependent locality-sensitive hashing. In Proc. SoCG ’16 , pages 9:1–9:11, 2016

  21. [21]

    Arasu, V

    A. Arasu, V . Ganti, and R. Kaushik. Efficient exact set-similarity joins. In Proceedings of the 32nd international conference on Very large data bases, pages 918–929. VLDB Endowment, 2006. Bibliography 221

  22. [22]

    Augsten and M

    N. Augsten and M. H. Böhlen. Similarity joins in relational database systems. Synthesis Lectures on Data Management, 5(5):1–124, 2013

  23. [23]

    Aumüller, T

    M. Aumüller, T. Christiani, R. Pagh, and F. Silvestri. Distance- sensitive hashing. pages 89–104, 2018

  24. [24]

    R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In Proc. 16th World Wide Web Conference (WWW) , pages 131–140, 2007

  25. [25]

    Becker, L

    A. Becker, L. Ducas, N. Gama, and T. Laarhoven. New directions in nearest neighbor searching with applications to lattice sieving. In Proc. SODA ’16, pages 10–24, 2016

  26. [26]

    Benyamini and J

    Y. Benyamini and J. Lindenstrauss. Geometric nonlinear functional analysis, volume 48. American Mathematical Soc., Providence, Rhode Island, 1998

  27. [27]

    Bhattacharya, R

    M. Bhattacharya, R. Creutzburg, and J. Astola. Some historical notes on the number theoretic transform. In Proc. 2004 Int. TICS Workshop on Spectral Methods and Multirate Signal Processing , 2004

  28. [28]

    Bostan and É

    A. Bostan and É. Schost. Polynomial evaluation and interpolation on special sets of points. J. Complexity, 21(4):420–446, 2005

  29. [29]

    Braun-Blanquet

    J. Braun-Blanquet. Plant sociology. The study of plant communities . McGraw-Hill, 1932

  30. [30]

    Braverman

    M. Braverman. Polylogarithmic independence fools AC0 circuits. J. ACM, 57(5):28:1–28:10, 2010

  31. [31]

    A. Z. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences 1997. Proceedings , pages 21–29. IEEE, 1997

  32. [32]

    A. Z. Broder. Identifying and filtering near-duplicate documents. In Proc. Symp. on Combinatorial Pattern Matching (CPM) , pages 1–10. Springer, 2000

  33. [33]

    A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations (extended abstract). In Proc. STOC ’98, pages 327–336, 1998. 222 Bibliography

  34. [34]

    A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min- wise independent permutations. J. Comput. Syst. Sci., 60(3):630–659, 2000

  35. [35]

    A. Z. Broder, A. M. Frieze, and E. Upfal. Static and dynamic path selection on expander graphs: a random walk approach. Random Structures & Algorithms, 14(1):87–109, 1999

  36. [36]

    A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syn- tactic clustering of the web. Computer Networks, 29(8-13):1157–1166, 1997

  37. [37]

    A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syn- tactic clustering of the web. Computer Networks and ISDN Systems, 29(8):1157–1166, 1997

  38. [38]

    Burhman, P

    H. Burhman, P . B. Miltersen, J. Radhakrishnan, and S. Venkatesh. Are bitvectors optimal? SIAM J. Comput., 31(6):1723–1744, 2002

  39. [39]

    M. Capalbo. Explicit bounded-degree unique-neighbor concentra- tors. Combinatorica, 25(4):379–391, 2005

  40. [40]

    Capalbo, O

    M. Capalbo, O. Reingold, S. Vadhan, and A. Wigderson. Random- ness conductors and constant-degree lossless expanders. In Proc. STOC ’02, pages 659–668, 2002

  41. [41]

    J. L. Carter and M. N. Wegman. Universal classes of hash functions. In Proc. STOC ’77, pages 106–112, 1977

  42. [42]

    J. L. Carter and M. N. Wegman. Universal classes of hash functions. J. Comput. Syst. Sci., 18(2):143–154, 1979

  43. [43]

    J. L. Carter and M. N. Wegman. New hash functions and their use in authentication and set equality. J. Comput. System Sci., 22(3):265– 279, 1981

  44. [44]

    Chakrabarti, V

    A. Chakrabarti, V . Satuluri, A. Srivathsan, and S. Parthasarathy. Locality sensitive hashing for similarity search and estima- tion. https://sites.google.com/site/lshallpairs/,

  45. [45]

    Bibliography 223

    [Accessed 15-May-2017]. Bibliography 223

  46. [46]

    Chakrabarti, V

    A. Chakrabarti, V . Satuluri, A. Srivathsan, and S. Parthasarathy. A bayesian perspective on locality sensitive hashing with extensions for kernel methods. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(2):19, 2015

  47. [47]

    Chakraborty, E

    D. Chakraborty, E. Goldenberg, and M. Kouck `y. Streaming al- gorithms for embedding and computing edit distance in the low distance regime. In Proc. STOC ’16, pages 712–725, 2016

  48. [48]

    J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for sim- ulating stable random variables. Jour. Am. Stat. Assoc., 71(354):340– 344, 1976

  49. [49]

    Charikar

    M. Charikar. Similarity estimation techniques from rounding algo- rithms. In Proc. STOC ’02, pages 380–388, 2002

  50. [50]

    Chaudhuri, V

    S. Chaudhuri, V . Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In Proc. 22nd Conference on Data Engineering (ICDE), page 5, 2006

  51. [51]

    Chierichetti and R

    F. Chierichetti and R. Kumar. Lsh-preserving functions and their applications. J. ACM, 62(5):33, 2015

  52. [52]

    Chierichetti, R

    F. Chierichetti, R. Kumar, A. Panconesi, and E. Terolli. The distor- tion of locality sensitive hashing. In Proc. ITCS ’17, pages 54:1–54:18, 2017

  53. [53]

    S. Choi, S. Cha, and C. C. Tappert. A survey of binary similarity and distance measures. J. Syst. Cybern. Informatics, 8(1):43–48, 2010

  54. [54]

    B. Chor, O. Goldreich, J. Hastad, J. Freidmann, S. Rudich, and R. Smolensky. The bit extration problem or t-resilient functions. In Proc. FOCS ’85, pages 396–407, 1985

  55. [55]

    Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

    T. Christiani. Fast locality-sensitive hashing for approximate near neighbor search. CoRR, abs/1708.07586, 2017

  56. [56]

    Christiani

    T. Christiani. A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In Proc. SODA ’17, pages 31–46, 2017

  57. [57]

    Christiani and R

    T. Christiani and R. Pagh. Generating k-independent variables in constant time. In Proc. FOCS ’14, pages 196–205, 2014. 224 Bibliography

  58. [58]

    Christiani and R

    T. Christiani and R. Pagh. Set similarity search beyond minhash. In Proc. STOC ’17, pages 1094–1107, 2017

  59. [59]

    Scalable and robust set similarity join

    T. Christiani, R. Pagh, and J. Sivertsen. Scalable and robust set similarity join. CoRR, abs/1707.06814, 2017

  60. [60]

    Christiani, R

    T. Christiani, R. Pagh, and M. Thorup. From independence to expansion and back again. In Proc. STOC ’15, pages 813–820, 2015

  61. [61]

    Chung, M

    K. Chung, M. Mitzenmacher, and S. P . Vadhan. Why simple hash functions work: Exploiting the entropy in a data stream. Theory of Computing, 9:897–945, 2013

  62. [62]

    E. Cohen. Size-estimation framework with applications to transitive closure and reachability. J. Comp. Syst. Sci., 55(3):441–453, 1997

  63. [63]

    Cohen, M

    E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P . Indyk, R. Motwani, J. D. Ullman, and C. Yang. Finding interesting associations with- out support pruning. IEEE Transactions on Knowledge and Data Engineering, 13(1):64–78, 2001

  64. [64]

    Cohen and H

    E. Cohen and H. Kaplan. Leveraging discarded samples for tighter estimation of multiple-set aggregates. ACM SIGMETRICS Perfor- mance Evaluation Review, 37(1):251–262, 2009

  65. [65]

    Cohen and D

    J. Cohen and D. Kane. Bounds on the independence required for cuckoo hashing. http://web.mit.edu/jscohen/www/ cuckoo.pdf, 2009. [Online; accessed 29-April-2018]

  66. [66]

    Dahlgaard, M

    S. Dahlgaard, M. B. T. Knudsen, and M. Thorup. Fast similarity sketching. In Proc. FOCS ’17, pages 663–671, 2017

  67. [67]

    Dasgupta, R

    A. Dasgupta, R. Kumar, and T. Sarlós. Fast locality-sensitive hash- ing. In Proc. SIGKDD ’11, pages 1073–1081, 2011

  68. [68]

    Datar, N

    M. Datar, N. Immorlica, P . Indyk, and V . S. Mirrokni. Locality- sensitive hashing scheme based on p-stable distributions. In Proc. SOCG ’04, pages 253–262, 2004

  69. [69]

    On the Complexity of Closest Pair via Polar-Pair of Point-Sets

    R. David, Karthik C. S., and B. Laekhanukit. The curse of medium dimension for geometric problems in almost every norm. CoRR, abs/1608.03245, 2016. Bibliography 225

  70. [70]

    de Berg, M

    M. de Berg, M. van Kreveld, M. Overmars, and O. C. Schwarzkopf. Computational geometry. Springer, Berlin, third edition, 2008

  71. [71]

    Dietzfelbinger

    M. Dietzfelbinger. On randomness in hash functions (invited talk). In Proc. STACS ’12, pages 25–28, 2012

  72. [72]

    Dietzfelbinger and F

    M. Dietzfelbinger and F. Meyer auf der Heide. A new universal class of hash functions and dynamic hashing in real time. In Proc. ICALP ’90, pages 6–19, 1990

  73. [73]

    Dietzfelbinger and M

    M. Dietzfelbinger and M. Rink. Applications of a splitting trick. In Proc. ICALP ’09, pages 354–365, 2009

  74. [74]

    Dietzfelbinger and P

    M. Dietzfelbinger and P . Woelfel. Almost random graphs with simple hash functions. In Proc. STOC ’03, pages 629–638, 2003

  75. [75]

    M. Dubiner. Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem. IEEE Trans. Information Theory, 56(8):4166–4179, 2010

  76. [76]

    Duhamel and M

    P . Duhamel and M. Vetterli. Fast Fourier transforms: a tutorial review and state of the art. Signal Processing, 19(4):259–299, 1990

  77. [77]

    Eshghi and S

    K. Eshghi and S. Rajaram. Locality sensitive hash functions based on concomitant rank order statistics. In Proc. KDD ’08 , pages 221–229, 2008

  78. [78]

    M. L. Fredman, J. Komlós, and E. Szemerédi. Storing a sparse table with 0(1) worst case access time. J. ACM, 31(3):538–544, 1984

  79. [79]

    Gao and T

    S. Gao and T. Mateer. Additive fast Fourier transforms over finite fields. IEEE Trans. Inf. Theory, 56(12):6265–6272, 2010

  80. [80]

    Gionis, P

    A. Gionis, P . Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proc. VLDB ’99, pages 518–529, 1999

Showing first 80 references.