An Empirical Comparison of FAISS and FENSHSES for Nearest Neighbor Search in Hamming Space
Pith reviewed 2026-05-25 17:23 UTC · model grok-4.3
The pith
Evaluating main-memory and secondary-memory nearest neighbor search systems reveals performance trade-offs in Hamming space.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a direct performance comparison using indexing speed, search latency, and RAM consumption between main-memory and secondary-memory nearest neighbor search implementations in Hamming space will illuminate previously unaddressed trade-offs in how these systems scale.
What carries the argument
The three evaluation metrics of indexing speed, search latency, and RAM consumption applied to nearest neighbor search in Hamming space.
If this is right
- If the comparison holds, system designers can select implementations according to whether speed or memory footprint is the tighter constraint.
- Secondary memory systems may become viable for very large data sets where main memory capacity is exceeded.
- The metrics provide a basis for predicting behavior on new data sets in similar spaces.
Where Pith is reading between the lines
- Extending the comparison to include energy use or network latency could reveal additional trade-offs for distributed settings.
- The approach might apply to other binary coding schemes beyond the ones tested.
- Results could guide development of systems that dynamically switch between memory types.
Load-bearing premise
The chosen main-memory and secondary-memory systems are representative, and the three metrics together with the data sets used capture the essential trade-offs.
What would settle it
An independent run of the same experiments that finds no consistent differences in the reported metrics across the systems would undermine the value of the comparison for understanding trade-offs.
read the original abstract
In this paper, we compare the performances of FAISS and FENSHSES on nearest neighbor search in Hamming space--a fundamental task with ubiquitous applications in nowadays eCommerce. Comprehensive evaluations are made in terms of indexing speed, search latency and RAM consumption. This comparison is conducted towards a better understanding on trade-offs between nearest neighbor search systems implemented in main memory and the ones implemented in secondary memory, which is largely unaddressed in literature.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to perform a comprehensive empirical comparison of FAISS (main-memory) and FENSHSES (secondary-memory) for nearest neighbor search in Hamming space, evaluating the systems on indexing speed, search latency, and RAM consumption in order to illuminate trade-offs between main-memory and secondary-memory implementations, a topic the authors state is largely unaddressed in the literature.
Significance. If the evaluations are conducted with proper controls, representative data sets, and reproducible protocols, the work could supply practical guidance on system selection for Hamming-space nearest-neighbor tasks in eCommerce and related domains; the explicit focus on the three metrics and the main-versus-secondary memory distinction is a clear framing that, when backed by data, would address a genuine gap.
major comments (1)
- [Abstract] The manuscript supplies only the abstract; no experimental protocol, data sets, hardware specifications, numerical results, or error-bar reporting appear in the provided text. Without these elements the central claim of 'comprehensive evaluations' cannot be verified and the representativeness of the chosen systems and metrics cannot be assessed.
Simulated Author's Rebuttal
We thank the referee for the detailed review and the opportunity to clarify the manuscript. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] The manuscript supplies only the abstract; no experimental protocol, data sets, hardware specifications, numerical results, or error-bar reporting appear in the provided text. Without these elements the central claim of 'comprehensive evaluations' cannot be verified and the representativeness of the chosen systems and metrics cannot be assessed.
Authors: The referee is correct that the review materials contained only the abstract. The full manuscript (as submitted to arXiv:1906.10095) includes the experimental protocol, the specific datasets employed, hardware specifications, numerical results, and reporting of variability. In the revised version we will expand the main text to foreground these elements with dedicated sections on methodology, datasets, and results (including error bars) so that the evaluations can be fully verified and the representativeness of the systems and metrics assessed. revision: yes
Circularity Check
No circularity: pure empirical benchmark with no derivations or self-referential predictions
full rationale
The paper performs an external empirical comparison of two independent software systems (FAISS and FENSHSES) on standard metrics (indexing speed, search latency, RAM) using public datasets. No equations, fitted parameters, predictions derived from the paper's own data, or self-citations are used to justify any core claim. The central activity is measurement against external implementations, which is self-contained and falsifiable by re-running the benchmarks. No load-bearing step reduces to a definition, fit, or author citation chain.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
Erik B. 2018. Annoy: Approximate Nearest Neighbors in C++/Python . https://pypi.org/project/annoy/ Python package version 1.13.0
work page 2018
-
[3]
E. P. Brenner, J. Zhao, A. Kutiyanawala, and Z. Yan. 2018. End-to-End Neu- ral Ranking for eCommerce Product Search. In SIGIR eCommerce Work- shop
work page 2018
-
[4]
A. Chaudhuri, P. Messina, S. Kokkula, A. Subramanian, A. Krishnan, S. Gandhi, A. Magnani, and V. Kandaswamy. 2018. A Smart System f or Se- lection of Optimal Product Images in E-Commerce. In Big Data
work page 2018
-
[5]
Qi Chen, Haidong Wang, Mingqin Li, Gang Ren, Scarlett Li, Jef- fery Zhu, Jason Li, Chuanjie Liu, Lintao Zhang, and Jingdong Wang
-
[6]
https://github.com/Microsoft/SPTAG
SPTAG: A library for fast approximate nearest neighbor searc h. https://github.com/Microsoft/SPTAG
- [7]
-
[8]
M. Deshpande and G. Karypis. 2004. Item-based top-n reco mmendation algorithms. ACM Transactions on Information Systems (TOIS) 22, 1 (2004), 143–177
work page 2004
-
[9]
H. Hu, R. Zhu, Y. Wang, W. Feng, X. Tan, and J. Huang. 2018. A Best Match KNN-based Approach for Large-scale Product Categori zation. In SIGIR eCommerce Data Challenge
work page 2018
-
[10]
Billion-scale similarity search with GPUs
J. Johnson, M. Douze, and H. Jégou. 2017. Billion-scale s imilarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[11]
E. Klinger and D. Starkweather. 2010. pHash–the open source perceptual hash library . Technical Report. accessed 2016-05-19.[Online]. Availa ble: http://www. phash. org/apps
work page 2010
- [12]
-
[13]
A. Magnani, F. Liu, M. Xie, and S. Banerjee. 2019. Neural Product Retrieval at Walmart. com. In WWW Workshop on eCommerce and NLP . ACM
work page 2019
- [14]
-
[15]
C. Mu, J. Zhao, G. Yang, B. Yang, and Z. Yan. 2019. Empower ing Elastic- search with Exact and Fast r -Neighbor Search in Hamming Space. arXiv preprint arXiv:1902.08498 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[16]
C. Mu, J. Zhao, G. Yang, B. Yang, and Z. Yan. 2019. Fast and Exact Near- est Neighbor Search in Hamming Space on Full-Text Search Eng ines. In SISAP
work page 2019
-
[17]
C. Mu, J. Zhao, G. Yang, J. Zhang, and Z. Yan. 2018. Toward s Practical Vi- sual Search Engine Within Elasticsearch. In SIGIR eCommerce Workshop
work page 2018
- [18]
-
[19]
A. Raghava-Raju. 2017. Predicting Fraud in Electronic Commerce: Fraud Detection Techniques in E-Commerce. International Journal of Computer Applications 171, 2 (2017)
work page 2017
-
[20]
M. Ruzicka, V. Novotny, P. Sojka, J. Pomikalek, and R. Re hurek. 2017. Flex- ible Similarity Search of Semantic Vectors Using Fulltext S earch Engines. In ISWC HSSUES Workshop
work page 2017
-
[21]
J. Rygl, J. Pomikalek, R. Rehurek, M. Ruzicka, V. Novotn y, and P. Sojka
-
[22]
Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines. In RepL4NLP Workshop
-
[23]
J. Wang, J. Wang, G. Zeng, Z. Tu, R. Gan, and S. Li. 2012. Sc alable k-nn graph construction for visual descriptors. In CVPR
work page 2012
-
[24]
F. Yang, A. Kale, Y. Bubnov, L. Stein, Q. Wang, H. Kiapour , and R. Pira- muthu. 2017. Visual search at ebay. In KDD. 3
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.