On Slicing Sorted Integer Sequences
Pith reviewed 2026-05-25 11:17 UTC · model grok-4.3
The pith
Recursive slicing of the number universe for sorted integer sequences yields faster list intersections and unions than cardinality partitioning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A solution that recursively slices the universe of representation of a sequence achieves compact storage and fast query execution. Albeit larger than some state-of-the-art representations, this slicing approach substantially improves the performance of list intersections and unions while operating in compressed space, offering an excellent space/time trade-off.
What carries the argument
Recursive universe slicing, which partitions the sequence by repeatedly dividing the range of possible values to balance storage and access speed.
If this is right
- List intersections and unions execute faster in compressed space.
- Space/time trade-off improves for inverted index compression problems.
- Partitioning by universe receives more attention as a viable alternative to cardinality partitioning.
- Query resolution in large-scale retrieval systems can benefit from this method.
Where Pith is reading between the lines
- Integrating this into existing search engines could reduce query latency without increasing memory much.
- The recursive slicing might apply to other data structures involving sorted integers, such as in databases.
- Further optimizations could combine this with other compression techniques for even better results.
Load-bearing premise
That implementing and testing the recursive slicing will show measurable speed gains over current methods for list operations.
What would settle it
Running the slicing method on standard inverted index benchmarks and finding no speedup or even slowdown in intersection and union times compared to cardinality-based approaches.
Figures
read the original abstract
Representing sorted integer sequences in small space is a central problem for large-scale retrieval systems such as Web search engines. Efficient query resolution, e.g., intersection or random access, is achieved by carefully partitioning the sequences. In this work we describe and compare two different partitioning paradigms: partitioning by cardinality and partitioning by universe. Although the ideas behind such paradigms have been known in the coding and algorithmic community since many years, inverted index compression has extensively adopted the former paradigm, whereas the latter has received only little attention. As a result, an experimental comparison between these two is missing for the setting of inverted index compression. We also propose and implement a solution that recursively slices the universe of representation of a sequence to achieve compact storage and attain to fast query execution. Albeit larger than some state-of-the-art representations, this slicing approach substantially improves the performance of list intersections and unions while operating in compressed space, thus offering an excellent space/time trade-off for the problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript compares cardinality-based and universe-based partitioning paradigms for representing sorted integer sequences in inverted indexes. It proposes and implements a recursive universe-slicing technique, presents an experimental comparison (previously missing from the literature), and claims that the slicing method, while larger than some state-of-the-art representations, delivers substantially faster intersections and unions in compressed space and therefore an excellent space/time trade-off.
Significance. If the reported speedups hold under standard inverted-index workloads and baselines, the work supplies the first direct experimental comparison of the two partitioning paradigms and introduces a practical alternative that could improve query latency in large-scale retrieval systems without sacrificing compression.
major comments (1)
- The central performance claim rests entirely on the new experiments for recursive slicing; the abstract itself notes that 'an experimental comparison between these two is missing' and that the paradigm 'has received only little attention'. Without the full experimental section (datasets, baselines, implementation details, and statistical significance), it is impossible to assess whether the reported gains are robust to baseline choice or implementation overhead.
Simulated Author's Rebuttal
We thank the referee for their review. We address the major comment point by point below.
read point-by-point responses
-
Referee: The central performance claim rests entirely on the new experiments for recursive slicing; the abstract itself notes that 'an experimental comparison between these two is missing' and that the paradigm 'has received only little attention'. Without the full experimental section (datasets, baselines, implementation details, and statistical significance), it is impossible to assess whether the reported gains are robust to baseline choice or implementation overhead.
Authors: The manuscript contains a full experimental section that details the datasets (standard TREC collections used for inverted-index evaluation), the baselines (representative methods from both cardinality-based and universe-based partitioning, including prior state-of-the-art codecs), the implementation of the recursive universe-slicing technique, and the measured intersection/union times in compressed space. This section supplies the direct experimental comparison that the abstract notes was previously absent from the literature. The reported speedups are therefore grounded in these concrete results rather than in the absence of experiments. revision: no
Circularity Check
No circularity: algorithmic proposal without derived quantities or self-referential steps
full rationale
The paper introduces a recursive universe-slicing technique for sorted integer sequences and contrasts cardinality-based vs. universe-based partitioning. No equations, fitted parameters, predictions, or first-principles derivations appear that could reduce to their own inputs by construction. The contribution consists of a new algorithmic method whose performance claims rest on implementation and experimental comparison rather than any self-definitional, fitted-input, or self-citation load-bearing chain. The abstract and description treat the approach as an independent engineering proposal whose value is assessed externally via runtime and space measurements.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Vo Ngoc Anh and Alistair Moffat. 2005. Inverted Index Compression Using Word-Aligned Binary Codes. Information Retrieval Journal 8, 1 (2005), 151–166
work page 2005
-
[2]
Vo Ngoc Anh and Alistair Moffat. 2010. Index compression using 64-bit words. Software: Practice and Experience 40, 2 (2010), 131–147. On Slicing Sorted Integer Sequences 19
work page 2010
-
[3]
Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Y
Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Y. Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the 12th ACM International Conference on Information and Knowledge Management. 426–434
work page 2003
-
[4]
Barla Cambazoglu and Ricardo Baeza-Yates
B. Barla Cambazoglu and Ricardo Baeza-Yates. 2015. Scalability Challenges in Web Search Engines . Morgan & Claypool Publishers
work page 2015
-
[5]
Samy Chambi, Daniel Lemire, Owen Kaser, and Robert Godin. 2016. Better bitmap performance with Roaring bitmaps. Software: practice and experience 46, 5 (2016), 709–719
work page 2016
-
[6]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.). MIT Press
work page 2009
-
[7]
Jeffrey Dean. 2009. Challenges in building large-scale information retrieval systems: invited talk. In Proceedings of the 2nd International Conference on Web Search and Data Mining
work page 2009
-
[8]
Peter Elias. 1974. Efficient Storage and Retrieval by Content and Address of Static Files. J. ACM 21, 2 (1974), 246–260
work page 1974
-
[9]
Peter Elias. 1975. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 21, 2 (1975), 194–203
work page 1975
-
[10]
Robert Mario Fano. 1971. On the number of bits required to implement an associative memory. Memorandum 61, Computer Structures Group, MIT (1971)
work page 1971
-
[11]
Solomon Golomb. 1966. Run-length encodings. IEEE Transactions on Information Theory 12, 3 (1966), 399–401
work page 1966
-
[12]
Daniel Lemire and Leonid Boytsov. 2015. Decoding billions of integers per second through vectorization. 45, 1 (2015), 1–29
work page 2015
-
[13]
Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri, Chris O’Hara, François Saint-Jacques, and Gregory Ssi-Yan-Kai
-
[14]
Software: Practice and Experience 48, 4 (2018), 867–895
Roaring bitmaps: Implementation of an optimized software library. Software: Practice and Experience 48, 4 (2018), 867–895
work page 2018
-
[15]
Daniel Lemire, Nathan Kurz, and Christoph Rupp. 2018. Stream-VByte: faster byte-oriented integer compression. Inform. Process. Lett. 130 (2018), 1–6
work page 2018
-
[16]
Daniel Lemire, Gregory Ssi-Yan-Kai, and Owen Kaser. 2016. Consistently faster and smaller compressed bitmaps with roaring. Software: Practice and Experience 46, 11 (2016), 1547–1569
work page 2016
-
[17]
Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster BlockMax WAND with Variable-sized Blocks. In Proceedings of the International ACM Conference on Research and Development in Information Retrieval. 625–634
work page 2017
-
[18]
2008.Introduction to Information Retrieval
Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008.Introduction to Information Retrieval. Cambridge University Press
work page 2008
-
[19]
Alistair Moffat. 2008. Compressing integer sequences and sets. In Encyclopedia of algorithms. Springer, 1–99
work page 2008
-
[20]
Alistair Moffat and Matthias Petri. 2017. ANS-Based Index Compression. In Proceedings of the ACM on Conference on Information and Knowledge Management . 677–686
work page 2017
-
[21]
Alistair Moffat and Matthias Petri. 2018. Index Compression Using Byte-Aligned ANS Coding and Two-Dimensional Contexts. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining . 405–413
work page 2018
-
[22]
Alistair Moffat and Lang Stuiver. 1996. Exploiting Clustering in Inverted File Compression. In Data Compression Conference. 82–91
work page 1996
-
[23]
Alistair Moffat and Lang Stuiver. 2000. Binary Interpolative Coding for Effective Index Compression. Information Retrieval Journal 3, 1 (2000), 25–47
work page 2000
-
[24]
Giuseppe Ottaviano and Rossano Venturini. 2014. Partitioned Elias-Fano Indexes. InProceedings of the 37th International Conference on Research and Development in Information Retrieval . 273–282
work page 2014
-
[25]
Prashant Pandey, Michael A Bender, and Rob Johnson. 2017. A fast x86 implementation of select. arXiv preprint arXiv:1706.00990 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[26]
Giulio Ermanno Pibiri, Matthias Petri, and Alistair Moffat. 2019. Fast Dictionary-Based Compression for Inverted Indexes. In International ACM Conference on Web Search and Data Mining . 9
work page 2019
-
[27]
Giulio Ermanno Pibiri and Rossano Venturini. 2017. Clustered Elias-Fano indexes. ACM Transactions on Information Systems 36, 1, Article 2 (2017), 33 pages
work page 2017
-
[28]
Giulio Ermanno Pibiri and Rossano Venturini. 2017. Dynamic Elias-Fano Representation. In Proceedings of the 28-th Annual Symposium on Combinatorial Pattern Matching . 30:1–30:14
work page 2017
-
[29]
Giulio Ermanno Pibiri and Rossano Venturini. 2018. Inverted Index Compression.Encyclopedia of Big Data Technologies (2018), 1–8. 20 Giulio Ermanno Pibiri
work page 2018
-
[30]
Giulio Ermanno Pibiri and Rossano Venturini. 2019. On Optimally Partitioning Variable-Byte Codes. IEEE Transactions on Knowledge and Data Engineering (2019), 1–12
work page 2019
-
[31]
Jeff Plaisance, Nathan Kurz, and Daniel Lemire. 2015. Vectorized VByte Decoding. In International Symposium on Web Algorithms
work page 2015
-
[32]
Stephen Robertson and Sparck Jones. 1976. Relevance weighting of search terms. Journal of the American Society for Information Science 27, 3 (1976), 129–146
work page 1976
-
[33]
Benjamin Schlegel, Thomas Willhalm, and Wolfgang Lehner. 2011. Fast Sorted-Set Intersection using SIMD Instructions.. In ADMS@ VLDB. 1–8
work page 2011
-
[34]
Falk Scholer, Hugh E Williams, John Yiannis, and Justin Zobel. 2002. Compression of inverted indexes for fast query evaluation. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 222–229
work page 2002
-
[35]
Fabrizio Silvestri. 2007. Sorting Out the Document Identifier Assignment Problem. In Proceedings of the 29th European Conference on IR Research . 101–112
work page 2007
-
[36]
Alexander Stepanov, Anil Gangolli, Daniel Rose, Ryan Ernst, and Paramjit Oberoi. 2011. SIMD-based decoding of posting lists. In Proceedings of the 20th International Conference on Information and Knowledge Management . 317–326
work page 2011
-
[37]
Larry H Thiel and HS Heaps. 1972. Program design for retrospective searches on large data bases. Information Storage and Retrieval 8, 1 (1972), 1–20
work page 1972
-
[38]
Andrew Trotman. 2014. Compression, SIMD, and postings lists. In Proceedings of the 2014 Australasian Document Computing Symposium. ACM, 50
work page 2014
-
[39]
Peter van Emde Boas. 1975. Preserving Order in a Forest in less than Logarithmic Time. In Proceedings of the 16-th Annual Symposium on Foundations of Computer Science . 75–84
work page 1975
-
[40]
Peter van Emde Boas. 1977. Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space. Inform. Process. Lett. 6, 3 (1977), 80–82
work page 1977
-
[41]
Sebastiano Vigna. 2013. Quasi-succinct indices. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 83–92
work page 2013
-
[42]
Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2017. An experimental study of bitmap compression vs. inverted list compression. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 993–1008
work page 2017
-
[43]
Ian Witten, Alistair Moffat, and Timothy Bell. 1999. Managing gigabytes: compressing and indexing documents and images (2nd ed.). Morgan Kaufmann
work page 1999
-
[44]
Hao Yan, Shuai Ding, and Torsten Suel. 2009. Inverted index compression and query processing with optimized document ordering. In Proceedings of the 18th International Conference on World Wide Web . 401–410
work page 2009
- [45]
-
[46]
Justin Zobel and Alistair Moffat. 2006. Inverted files for text search engines. Comput. Surveys 38, 2 (2006), 1–56
work page 2006
-
[47]
Marcin Zukowski, Sándor Héman, Niels Nes, and Peter Boncz. 2006. Super-Scalar RAM-CPU Cache Compression. In Proceedings of the 22nd International Conference on Data Engineering . 59–70
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.