Wavelet Forests Revisited
Pith reviewed 2026-05-10 16:31 UTC · model grok-4.3
The pith
Wavelet forests add select query support with little extra space and stay practically efficient.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Wavelet forests partition the input sequence into fixed-size blocks and build a separate wavelet tree for each block. We extend this structure to support select queries. Select support can be added with little additional space overhead and the resulting structures remain practically efficient. In experiments on a range of non-repetitive and repetitive inputs, wavelet forests are competitive with, and in most cases outperform, standalone wavelet-tree implementations. We also study the effect of internal parameters, including superblock size and navigational data, on select-query performance.
What carries the argument
Fixed-block partitioning in wavelet forests, which creates one independent wavelet tree per block of the input sequence.
If this is right
- Wavelet forests now support both rank and select with only small space growth over rank-only versions.
- The block-based structures often run faster than full wavelet trees on the tested repetitive and non-repetitive inputs.
- Select performance can be adjusted by changing superblock size and the type of navigational data stored.
- The same partitioning approach works across both repetitive and non-repetitive sequences.
Where Pith is reading between the lines
- These structures could replace full wavelet trees inside larger compressed text indexes to lower memory use while preserving query speed.
- Systematic tuning of superblock size for a given input type may further improve select times beyond the manual settings tested.
- The fixed-block idea might extend to additional sequence operations if similar lightweight navigational aids can be designed.
Load-bearing premise
The fixed-block partitioning and navigational data choices that worked for rank queries transfer to select queries without hidden performance cliffs on inputs outside the tested set.
What would settle it
A timing and space measurement of select queries on a large input sequence whose repetitiveness or alphabet size differs markedly from the paper's test sets, checking whether overhead stays under roughly five percent or query speed remains competitive.
read the original abstract
Rank and select queries are basic operations on sequences, with applications in compressed text indexes and other space-efficient data structures. One of the standard data structures supporting these queries is the wavelet tree. In this paper, we study wavelet forests, that is, wavelet-tree structures based on the fixed-block compression boosting technique. Such structures partition the input sequence into fixed-size blocks and build a separate wavelet tree for each block. Previous work showed that this approach yields strong practical performance for rank queries. We extend wavelet forests to support select queries. We show that select support can be added with little additional space overhead and that the resulting structures remain practically efficient. In experiments on a range of non-repetitive and repetitive inputs, wavelet forests are competitive with, and in most cases outperform, standalone wavelet-tree implementations. We also study the effect of internal parameters, including superblock size and navigational data, on select-query performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends wavelet forests—wavelet-tree structures that partition the input sequence into fixed-size blocks and build a separate wavelet tree per block—to support select queries in addition to rank. It claims that select support can be added with little additional space overhead, that the resulting structures remain practically efficient, and that they are competitive with or outperform standalone wavelet-tree implementations on a range of non-repetitive and repetitive inputs. The work also examines the effects of internal parameters such as superblock size and navigational data on select-query performance.
Significance. If the experimental claims hold, the result is a practical improvement to a standard building block for compressed text indexes and other space-efficient sequence data structures. The focus on adding select support without substantial overhead, combined with empirical evaluation across input classes, addresses a useful gap between theoretical wavelet trees and deployable implementations.
major comments (1)
- Experimental section: the central performance claim (select support added with little overhead while remaining competitive) rests on experiments whose full methodology, error bars, implementation details, and per-query space/time breakdowns are not supplied in sufficient detail to allow independent verification or assessment of robustness outside the tested inputs.
minor comments (2)
- Abstract: the description of the experimental inputs could be more precise (e.g., naming the concrete data sets or their characteristics) so readers can immediately gauge the scope of the competitiveness claim.
- Notation: ensure that the space-overhead formulas for the added select structures are stated explicitly and contrasted with the rank-only case, preferably with a table or equation reference.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the recommendation of minor revision. We address the single major comment below.
read point-by-point responses
-
Referee: Experimental section: the central performance claim (select support added with little overhead while remaining competitive) rests on experiments whose full methodology, error bars, implementation details, and per-query space/time breakdowns are not supplied in sufficient detail to allow independent verification or assessment of robustness outside the tested inputs.
Authors: We agree that greater detail in the experimental section would improve reproducibility and allow readers to better assess robustness. In the revised manuscript we will expand the section to include: (1) a more complete description of the implementation, covering the exact representation of navigational data, the select-support structures, and the block/superblock layout; (2) hardware and software environment details together with the number of repetitions performed for each timing measurement; (3) error bars or standard deviations on all reported running times and space figures; and (4) additional per-query space and time breakdowns for the dominant operations. These additions will directly address the concerns raised while preserving the existing experimental narrative. revision: yes
Circularity Check
No significant circularity
full rationale
The paper extends wavelet forests to support select queries by adding navigational structures and reports that this incurs little space overhead while preserving practical efficiency. These claims rest on new experimental benchmarks across non-repetitive and repetitive inputs rather than on any closed-form derivation, fitted parameter, or self-citation chain that reduces the result to its own inputs by construction. Space-overhead formulas and parameter choices (superblock size, navigational data) are presented as design decisions whose performance impact is measured directly, not predicted from previously fitted quantities. No load-bearing self-citation or uniqueness theorem is invoked to force the outcome; the central result is therefore self-contained and externally falsifiable via re-implementation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
1Common crawl. URL:https://commoncrawl.org. 2 Maxim A. Babenko, Pawel Gawrychowski, Tomasz Kociumaka, and Tatiana Starikovskaya. Wavelet trees meet suffix trees. In Piotr Indyk, editor,Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), pages 572–591. SIAM,
work page 2015
-
[2]
doi:10.1137/1.9781611973730.39. E. Chiu and D. Kempa 9 Figure 4Effect of superblock size on the select-query performance of wavelet forests. Time is measured in microseconds, and space is expressed as a percentage of the original text size. 3 Michael Burrows. A block-sorting lossless data compression algorithm.SRS Research Report, 124,
-
[3]
4 Matteo Ceregini, Florian Kurpicz, and Rossano Venturini. Faster wavelet tree queries. In Ali Bilgin, James E. Fowler, Joan Serra-Sagristà, Yan Ye, and James A. Storer, editors, Proceedings of the 2024 Data Compression Conference (DCC 2024), pages 223–232. IEEE, 2024.doi:10.1109/DCC58796.2024.00030. 5 Eric Chiu and Dominik Kempa. Fast select queries usin...
-
[4]
7 Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel
doi: 10.1016/J.IS.2014.06.002. 7 Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel. Practical wavelet tree construction.ACM Journal of Experimental Algorithmics, 26:1.8:1–1.8:67,
-
[5]
8 Patrick Dinklage, Johannes Fischer, and Florian Kurpicz
doi:10.1145/3457197. 8 Patrick Dinklage, Johannes Fischer, and Florian Kurpicz. Constructing the wavelet tree and wavelet matrix in distributed memory. In Guy E. Blelloch and Irene Finocchi, editors, Proceedings of the 2020 Symposium on Algorithm Engineering and Experiments (ALENEX 2020), pages 214–228. SIAM, 2020.doi:10.1137/1.9781611976007.17. 9 Patrick...
-
[6]
15 Travis Gagie, Gonzalo Navarro, and Simon J
doi:10.1145/3375890. 15 Travis Gagie, Gonzalo Navarro, and Simon J. Puglisi. New algorithms on wavelet trees and applications to information retrieval.Theoretical Computer Science, 426:25–41,
-
[7]
16 Simon Gog.Compressed suffix trees: design, construction, and applications
doi:10.1016/J.TCS.2011.12.002. 16 Simon Gog.Compressed suffix trees: design, construction, and applications. PhD thesis, University of Ulm,
-
[8]
From theory to practice: Plug and play with succinct data structures
17 Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In Joachim Gudmundsson and Jyrki Katajainen, editors,Proceedings of the 13th International Symposium on Experimental Algorithms (SEA 2014), pages 326–337. Springer, 2014.doi:10.1007/978-3-319-07959-2\_28. 18 Simon Gog, Juha...
-
[9]
doi:10.1007/978-1-4939-2864-4\_642. 20 Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. InProceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pages 841–850. ACM/SIAM,
-
[10]
21 Roberto Grossi and Jeffrey Scott Vitter
URL:http://dl.acm.org/citation.cfm? id=644108.644250. 21 Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching.SIAM Journal on Computing, 35(2):378– 407, 2005.doi:10.1137/S0097539702402354. 22 Roberto Grossi, Jeffrey Scott Vitter, and Bojian Xu. Wavelet trees: From theory ...
-
[11]
29 Veli Mäkinen and Gonzalo Navarro
doi: 10.1016/J.JDA.2017.04.001. 29 Veli Mäkinen and Gonzalo Navarro. Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing, 12(1):40–66,
-
[12]
Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter
30 J. Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter. Fast construction of wavelet trees. Theoretical Computer Science, 638:91–97, 2016.doi:10.1016/J.TCS.2015.11.011. 31 Gonzalo Navarro. Wavelet trees for all.Journal of Discrete Algorithms, 25:2–20,
-
[13]
32 Gonzalo Navarro.Compact data structures: A practical approach
doi:10.1016/J.JDA.2013.07.004. 32 Gonzalo Navarro.Compact data structures: A practical approach. Cambridge University Press, Cambridge, UK, 2016.doi:10.1017/cbo9781316588284. 33 Project Gutenberg. Project Gutenberg. Accessed: 2025-11-03. URL:https://www.gutenberg. org/. 34 Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictiona...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.