The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

Martin Aum\"uller; Matteo Ceccarello

arxiv: 1907.07387 · v1 · pith:6RAADYBVnew · submitted 2019-07-17 · 💻 cs.IR · cs.DB

The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

Martin Aum\"uller , Matteo Ceccarello This is my paper

Pith reviewed 2026-05-24 20:16 UTC · model grok-4.3

classification 💻 cs.IR cs.DB

keywords nearest neighbor searchlocal intrinsic dimensionalitybenchmarkingquery difficultydataset diversityperformance evaluationquery sets

0 comments

The pith

Local intrinsic dimensionality selects query sets spanning wide difficulty ranges in real-world nearest neighbor search benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that local intrinsic dimensionality can be used to pick queries of many different difficulty levels from the same real datasets. This makes it possible to study how nearest neighbor search running times change with query hardness in a controlled way. The work also finds that standard real-world datasets are not diverse, because running-time results on any one of them predict results on the others quite well. A reader cares because current benchmarks may give an incomplete picture of how algorithms behave across varying conditions.

Core claim

The authors establish that local intrinsic dimensionality distributions allow curation of query sets across a broad difficulty spectrum on real data, that different such distributions produce measurable differences in search running times, and that common benchmark datasets exhibit low diversity because performance outcomes transfer reliably between them.

What carries the argument

Local intrinsic dimensionality (LID) as a scalar measure of local data complexity around each query point, used both to select queries and to visualize performance behavior.

If this is right

Different LID distributions of queries produce observable differences in running-time performance across nearest neighbor search implementations.
Visualization methods based on LID give a finer view of how search algorithms behave on individual queries.
Results obtained on any single real-world dataset transfer well to other such datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Benchmark suites could adopt LID-based query selection as a standard way to ensure coverage of easy and hard cases.
The observed lack of diversity suggests that adding more similar real datasets may not increase the range of tested conditions.
Synthetic data generators tuned to produce specific LID profiles might complement existing benchmarks for more targeted testing.

Load-bearing premise

Local intrinsic dimensionality accurately reflects the computational difficulty of nearest neighbor queries.

What would settle it

Finding no consistent difference in search running times when query sets are deliberately chosen from different LID ranges across multiple datasets and algorithms.

Figures

Figures reproduced from arXiv: 1907.07387 by Martin Aum\"uller, Matteo Ceccarello.

**Figure 1.** Figure 1: LID distribution for each dataset. Ticks below the distribution curves represent single queries. Lines within each distribution curve correspond to the 25, 50 and 75 percentile. The red line marks the 10 000 largest estimated LID, which we use as a threshold value to define hard query sets. nearest neighbor search algorithms. Fashion-MNIST is considered as a replacement for MNIST, which is usually conside… view at source ↗

**Figure 2.** Figure 2: Recall-QPS (1/s) tradeoff – up and to the right is better; top: SIFT, bottom: GLOVE-2M; left: easy, middle: middle, right: hard. Objectives of the Experiments Our experiments are tailored to answer the following questions: (Q1) How does the LID of a query set influence the running time performance? (Section 5.1) (Q2) How diverse are measurements obtained on datasets? Do relative differences between the pe… view at source ↗

**Figure 3.** Figure 3: Recall-QPS (1/s) tradeoff – up and to the right is better; left: ONNG, right: Annoy; (E) — easy, (M) — medium, (H) — hard, (D) — diverse. 5.1 Influence of LID on Performance [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Ranking of algorithms on five different datasets, according to recall ≥ 0.75 and ≥ 0.9, and according to two different performance measures: number of queries per second (left) and number of distance computations (right). Both plots report the ratio with the best performing algorithm on each dataset: for the queries per second metric a larger ratio is better, for the number of distance computations metric,… view at source ↗

**Figure 5.** Figure 5: Distribution of performance for queries on the GLOVE-2M (medium difficulty) dataset. Looking just at the average performance can hide interesting behaviour [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Distribution of Recall vs. LID plot on the GLOVE datset. Intensity reflects number of queries that achieve a combination of recall vs. LID. For the bottom plots, we observe that individual query times of all the algorithms are well concentrated around their mean [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concept of local intrinsic dimensionality (LID) allows to choose query sets of a wide range of difficulty for real-world datasets. Moreover, the effect of different LID distributions on the running time performance of implementations is empirically studied. To this end, different visualization concepts are introduced that allow to get a more fine-grained overview of the inner workings of nearest neighbor search principles. The paper closes with remarks about the diversity of datasets commonly used for nearest neighbor search benchmarking. It is shown that such real-world datasets are not diverse: results on a single dataset predict results on all other datasets well.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LID selects varied query difficulties in NN benchmarks and common datasets lack diversity, but other factors may confound the proxy.

read the letter

The one or two things to know: this work uses local intrinsic dimensionality to select query sets that cover a wide difficulty range on real-world datasets for nearest neighbor search benchmarking. It also concludes that commonly used datasets are not diverse, as performance results on one dataset predict those on the others quite well. The paper does a good job bringing in visualizations that give a better look at the internals of NN search implementations. It empirically examines how different LID distributions influence running time performance. These are useful additions for people who need to understand and improve benchmark practices. What is actually new is the application of LID specifically for query set selection in this domain and the diversity observation as a distinct finding. On the soft spots, the main one is whether LID really serves as a reliable proxy without other factors being ruled out. The stress test points out that things like local density variations or the empirical distance distributions to k nearest neighbors might account for as much or more of the variance in recall or time. If the paper doesn't show comparisons or controls for those, the claim that LID allows choosing wide difficulty ranges is less secure. The diversity point could be affected if all the datasets have similar unmeasured properties. This is aimed at researchers in information retrieval and databases who focus on algorithm testing and benchmarking. Readers who care about making evaluations more robust will find the visualizations and the LID idea helpful. It deserves a serious referee because it engages with a real problem in the subfield and offers concrete suggestions, even if some aspects of the evidence could be tightened. I would recommend putting it through peer review.

Referee Report

3 major / 2 minor

Summary. The paper claims that local intrinsic dimensionality (LID) enables selection of query sets spanning a wide range of difficulties for nearest neighbor search on real-world datasets; that varying LID distributions affect the running-time performance of NN implementations (studied via new visualizations of algorithm internals); and that commonly used real-world datasets lack diversity because performance results on any single dataset predict results on the others well.

Significance. If the empirical claims hold after proper controls and documentation, the work could improve NN-search benchmarking by supplying a controllable proxy (LID) for query difficulty and by cautioning the community against treating a small collection of datasets as representative. The introduced visualizations might also aid diagnosis of algorithm behavior beyond aggregate recall or time metrics.

major comments (3)

[Abstract / query-set selection section] Abstract and the section describing the LID-based query-set construction: the central claim that LID distributions permit selection of query sets with a wide difficulty range (and that these distributions affect running times) requires evidence that LID is the dominant driver. No controls or comparisons are described against other per-query factors such as the empirical distribution of distances to the k nearest neighbors, local density, or cluster structure; without such controls the proxy assumption remains untested.
[Closing remarks on dataset diversity] The diversity claim (results on a single dataset predict results on all others): the manuscript must specify the exact datasets, the performance metrics used for the prediction test, the quantitative measure of predictive power (e.g., correlation or rank agreement), and any statistical significance assessment. The current statement is too coarse to evaluate whether the datasets truly share unmeasured properties that LID does not isolate.
[Empirical study section] Empirical study of LID effects on running time: the description supplies no information on the number of runs, variance estimation, hardware controls, or error bars, making it impossible to judge whether observed differences are attributable to LID rather than implementation or measurement artifacts.

minor comments (2)

[Introduction] Notation for LID and the precise definition of “difficulty” should be introduced earlier and used consistently.
[Visualization section] Figure captions for the new visualization concepts should explicitly state what each panel shows and how it relates to the LID-controlled query sets.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments identify areas where additional rigor and detail will strengthen the manuscript. We respond to each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [Abstract / query-set selection section] Abstract and the section describing the LID-based query-set construction: the central claim that LID distributions permit selection of query sets with a wide difficulty range (and that these distributions affect running times) requires evidence that LID is the dominant driver. No controls or comparisons are described against other per-query factors such as the empirical distribution of distances to the k nearest neighbors, local density, or cluster structure; without such controls the proxy assumption remains untested.

Authors: We acknowledge that the manuscript does not include explicit controls comparing LID against other per-query factors. The central contribution is the introduction of LID as a practical, controllable proxy together with visualizations that illustrate its effect on search behavior. We will revise the relevant sections to discuss potential confounding factors (distance distributions, local density, cluster structure) and to state explicitly that the current evidence is correlational rather than a controlled isolation of LID. This will be a textual clarification and limitation statement; no new experiments are planned at this stage. revision: partial
Referee: [Closing remarks on dataset diversity] The diversity claim (results on a single dataset predict results on all others): the manuscript must specify the exact datasets, the performance metrics used for the prediction test, the quantitative measure of predictive power (e.g., correlation or rank agreement), and any statistical significance assessment. The current statement is too coarse to evaluate whether the datasets truly share unmeasured properties that LID does not isolate.

Authors: We agree the claim is stated too coarsely. The experiments use the standard ANN-Benchmarks collection (SIFT, GloVe, MNIST, NYTimes, and similar). Performance is measured by recall@k and query time; predictive power is assessed via rank correlation of algorithm orderings across datasets. We will add the exact dataset list, the precise metrics, the correlation values obtained, and a brief note on statistical significance. These additions will make the diversity observation precise and reproducible. revision: yes
Referee: [Empirical study section] Empirical study of LID effects on running time: the description supplies no information on the number of runs, variance estimation, hardware controls, or error bars, making it impossible to judge whether observed differences are attributable to LID rather than implementation or measurement artifacts.

Authors: We accept this criticism on reproducibility. The experimental section will be expanded to report the number of independent runs per configuration, the hardware platform (CPU model, memory), the method used for variance estimation, and error bars on the time and recall plots. These details were present in our internal logs but omitted from the submitted text. revision: yes

Circularity Check

0 steps flagged

No circularity; claims are empirical observations from data

full rationale

The paper's core results—that LID distributions can be used to select query sets spanning difficulty ranges and that results on one real-world dataset predict others—are presented as direct empirical findings from measurements and visualizations on multiple datasets. No equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear in the abstract or described claims; the derivation chain consists of data-driven comparisons rather than reductions to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; no details on any fitted values or background assumptions beyond the use of LID as a difficulty measure.

pith-pipeline@v0.9.0 · 5636 in / 1025 out tokens · 25355 ms · 2026-05-24T20:16:24.811188+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 4 internal anchors

[1]

In: FOCS’15

Alman, J., Williams, R.: Probabilistic polynomials and hamming nearest neighbors. In: FOCS’15. pp. 136–150

work page
[2]

In: KDD’15

Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K.I., Nett, M.: Estimating local intrinsic dimensionality. In: KDD’15. pp. 29–38. ACM (2015)

work page 2015
[3]

In: Proceedings of the 2019 SIAM International Conference on Data Mining

Amsaleg, L., Chelly, O., Houle, M.E., Kawarabayashi, K.i., Radovanovi´ c, M., Treeratanajaru, W.: Intrinsic dimensionality estimation within tight localities. In: Proceedings of the 2019 SIAM International Conference on Data Mining. pp. 181–189. SIAM (2019)

work page 2019
[4]

5 Mayank Bawa, Tyson Condie, and Prasanna Ganesan

Aum¨ uller, M., Bernhardsson, E., Faithfull, A.: ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. In: SISAP’17. pp. 34–49 (2017). https://doi.org/10.1007/978-3-319-68474-1 3

work page doi:10.1007/978-3-319-68474-1 2017
[5]

In: 1st Workshop on Evaluation and Experimental Design in Data Mining and Machine Learning (EDML 2019) (2019), https://imada.sdu.dk/Research/EDML/

Aum¨ uller, M., Ceccarello, M.: Benchmarking nearest neighbor search: Inﬂuence of local intrinsic dimensionality and result diversity in real-world datasets. In: 1st Workshop on Evaluation and Experimental Design in Data Mining and Machine Learning (EDML 2019) (2019), https://imada.sdu.dk/Research/EDML/

work page 2019
[6]

Bernhardsson, E.: Annoy, https://github.com/spotify/annoy

work page
[7]

PVLDB 10(7), 769–780 (2017)

Casanova, G., Englmeier, E., Houle, M.E., Kr¨ oger, P., Nett, M., Schubert, E., Zimek, A.: Dimensional testing for reverse k-nearest neighbor search. PVLDB 10(7), 769–780 (2017)

work page 2017
[8]

ACM Comput

Ch´ avez, E., Navarro, G., Baeza-Yates, R., Marroqu´ ın, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (Sep 2001). https://doi.org/10.1145/502807.502808 14 Martin Aum¨ uller and Matteo Ceccarello

work page doi:10.1145/502807.502808 2001
[9]

Curtin, R.R., Cline, J.R., Slagle, N.P., March, W.B., Ram, P., Mehta, N.A., Gray, A.G.: MLPACK: A scalable C++ machine learning library. J. of Machine Learning Research 14, 801–805 (2013)

work page 2013
[10]

In: NIPS 2014 Workshop on Software Engineering for Machine Learning (2014)

Edel, M., Soni, A., Curtin, R.R.: An automatic benchmarking system. In: NIPS 2014 Workshop on Software Engineering for Machine Learning (2014)

work page 2014
[11]

In: Data Mining Workshops (ICDMW)

Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: Data Mining Workshops (ICDMW). pp. 468–473. IEEE (2013)

work page 2013
[12]

In: SISAP’18

Houle, M.E., Schubert, E., Zimek, A.: On the correlation between local intrinsic dimensionality and outlierness. In: SISAP’18. pp. 177–191 (2018). https://doi.org/10.1007/978-3-030-02224-2 14

work page doi:10.1007/978-3-030-02224-2 2018
[13]

ArXiv e-prints (Oct 2018)

Iwasaki, M., Miyazaki, D.: Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data. ArXiv e-prints (Oct 2018)

work page 2018
[14]

IEEE Trans

J´ egou, H., Douze, M., Schmid, C.: Product quantization for nearest neigh- bor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011). https://doi.org/10.1109/TPAMI.2010.57

work page doi:10.1109/tpami.2010.57 2011
[15]

Billion-scale similarity search with GPUs

Johnson, J., Douze, M., J´ egou, H.: Billion-scale similarity search with gpus. CoRR abs/1702.08734 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Israel Journal of Mathematics 54(2), 129–138 (1986)

Johnson, W.B., Lindenstrauss, J., Schechtman, G.: Extensions of Lipschitz maps into Banach spaces. Israel Journal of Mathematics 54(2), 129–138 (1986)

work page 1986
[17]

Springer (2011)

Jolliﬀe, I.: Principal component analysis. Springer (2011)

work page 2011
[18]

Kriegel, H., Schubert, E., Zimek, A.: The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowl. Inf. Syst. 52(2), 341–378 (2017)

work page 2017
[19]

In: NIPS’15

Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: NIPS’15. pp. 777–784 (2005)

work page 2005
[20]

Approximate Nearest Neighbor Search on High Dimensional Data --- Experiments, Analyses, and Improvement (v1.0)

Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR abs/1610.02455 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

ArXiv e-prints (Mar 2016)

Malkov, Y.A., Yashunin, D.A.: Eﬃcient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. ArXiv e-prints (Mar 2016)

work page 2016
[22]

Efficient Estimation of Word Representations in Vector Space

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Eﬃcient estimation of word repre- sentations in vector space. CoRR abs/1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[23]

In: Empirical Methods in Natural Language Processing (EMNLP)

Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word repre- sentation. In: Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543 (2014)

work page 2014
[24]

Computers & Operations Research 45, 12 – 24 (2014)

Smith-Miles, K., Baatar, D., Wreford, B., Lewis, R.: Towards objective measures of algorithm performance across instance space. Computers & Operations Research 45, 12 – 24 (2014)

work page 2014
[25]

In: KDD’17

Spring, R., Shrivastava, A.: Scalable and sustainable deep learning via randomized hashing. In: KDD’17. pp. 445–454 (2017). https://doi.org/10.1145/3097983.3098035

work page doi:10.1145/3097983.3098035 2017
[26]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for bench- marking machine learning algorithms. CoRR abs/1708.07747 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

In: FOCS’15

Alman, J., Williams, R.: Probabilistic polynomials and hamming nearest neighbors. In: FOCS’15. pp. 136–150

work page

[2] [2]

In: KDD’15

Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K.I., Nett, M.: Estimating local intrinsic dimensionality. In: KDD’15. pp. 29–38. ACM (2015)

work page 2015

[3] [3]

In: Proceedings of the 2019 SIAM International Conference on Data Mining

Amsaleg, L., Chelly, O., Houle, M.E., Kawarabayashi, K.i., Radovanovi´ c, M., Treeratanajaru, W.: Intrinsic dimensionality estimation within tight localities. In: Proceedings of the 2019 SIAM International Conference on Data Mining. pp. 181–189. SIAM (2019)

work page 2019

[4] [4]

5 Mayank Bawa, Tyson Condie, and Prasanna Ganesan

Aum¨ uller, M., Bernhardsson, E., Faithfull, A.: ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. In: SISAP’17. pp. 34–49 (2017). https://doi.org/10.1007/978-3-319-68474-1 3

work page doi:10.1007/978-3-319-68474-1 2017

[5] [5]

In: 1st Workshop on Evaluation and Experimental Design in Data Mining and Machine Learning (EDML 2019) (2019), https://imada.sdu.dk/Research/EDML/

Aum¨ uller, M., Ceccarello, M.: Benchmarking nearest neighbor search: Inﬂuence of local intrinsic dimensionality and result diversity in real-world datasets. In: 1st Workshop on Evaluation and Experimental Design in Data Mining and Machine Learning (EDML 2019) (2019), https://imada.sdu.dk/Research/EDML/

work page 2019

[6] [6]

Bernhardsson, E.: Annoy, https://github.com/spotify/annoy

work page

[7] [7]

PVLDB 10(7), 769–780 (2017)

Casanova, G., Englmeier, E., Houle, M.E., Kr¨ oger, P., Nett, M., Schubert, E., Zimek, A.: Dimensional testing for reverse k-nearest neighbor search. PVLDB 10(7), 769–780 (2017)

work page 2017

[8] [8]

ACM Comput

Ch´ avez, E., Navarro, G., Baeza-Yates, R., Marroqu´ ın, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (Sep 2001). https://doi.org/10.1145/502807.502808 14 Martin Aum¨ uller and Matteo Ceccarello

work page doi:10.1145/502807.502808 2001

[9] [9]

Curtin, R.R., Cline, J.R., Slagle, N.P., March, W.B., Ram, P., Mehta, N.A., Gray, A.G.: MLPACK: A scalable C++ machine learning library. J. of Machine Learning Research 14, 801–805 (2013)

work page 2013

[10] [10]

In: NIPS 2014 Workshop on Software Engineering for Machine Learning (2014)

Edel, M., Soni, A., Curtin, R.R.: An automatic benchmarking system. In: NIPS 2014 Workshop on Software Engineering for Machine Learning (2014)

work page 2014

[11] [11]

In: Data Mining Workshops (ICDMW)

Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: Data Mining Workshops (ICDMW). pp. 468–473. IEEE (2013)

work page 2013

[12] [12]

In: SISAP’18

Houle, M.E., Schubert, E., Zimek, A.: On the correlation between local intrinsic dimensionality and outlierness. In: SISAP’18. pp. 177–191 (2018). https://doi.org/10.1007/978-3-030-02224-2 14

work page doi:10.1007/978-3-030-02224-2 2018

[13] [13]

ArXiv e-prints (Oct 2018)

Iwasaki, M., Miyazaki, D.: Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data. ArXiv e-prints (Oct 2018)

work page 2018

[14] [14]

IEEE Trans

J´ egou, H., Douze, M., Schmid, C.: Product quantization for nearest neigh- bor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011). https://doi.org/10.1109/TPAMI.2010.57

work page doi:10.1109/tpami.2010.57 2011

[15] [15]

Billion-scale similarity search with GPUs

Johnson, J., Douze, M., J´ egou, H.: Billion-scale similarity search with gpus. CoRR abs/1702.08734 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

Israel Journal of Mathematics 54(2), 129–138 (1986)

Johnson, W.B., Lindenstrauss, J., Schechtman, G.: Extensions of Lipschitz maps into Banach spaces. Israel Journal of Mathematics 54(2), 129–138 (1986)

work page 1986

[17] [17]

Springer (2011)

Jolliﬀe, I.: Principal component analysis. Springer (2011)

work page 2011

[18] [18]

Kriegel, H., Schubert, E., Zimek, A.: The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowl. Inf. Syst. 52(2), 341–378 (2017)

work page 2017

[19] [19]

In: NIPS’15

Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: NIPS’15. pp. 777–784 (2005)

work page 2005

[20] [20]

Approximate Nearest Neighbor Search on High Dimensional Data --- Experiments, Analyses, and Improvement (v1.0)

Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement (v1.0). CoRR abs/1610.02455 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

ArXiv e-prints (Mar 2016)

Malkov, Y.A., Yashunin, D.A.: Eﬃcient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. ArXiv e-prints (Mar 2016)

work page 2016

[22] [22]

Efficient Estimation of Word Representations in Vector Space

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Eﬃcient estimation of word repre- sentations in vector space. CoRR abs/1301.3781 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[23] [23]

In: Empirical Methods in Natural Language Processing (EMNLP)

Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word repre- sentation. In: Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543 (2014)

work page 2014

[24] [24]

Computers & Operations Research 45, 12 – 24 (2014)

Smith-Miles, K., Baatar, D., Wreford, B., Lewis, R.: Towards objective measures of algorithm performance across instance space. Computers & Operations Research 45, 12 – 24 (2014)

work page 2014

[25] [25]

In: KDD’17

Spring, R., Shrivastava, A.: Scalable and sustainable deep learning via randomized hashing. In: KDD’17. pp. 445–454 (2017). https://doi.org/10.1145/3097983.3098035

work page doi:10.1145/3097983.3098035 2017

[26] [26]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for bench- marking machine learning algorithms. CoRR abs/1708.07747 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017