Wavelet Forests Revisited

Dominik Kempa; Eric Chiu

arxiv: 2604.11338 · v1 · submitted 2026-04-13 · 💻 cs.DS

Wavelet Forests Revisited

Eric Chiu , Dominik Kempa This is my paper

Pith reviewed 2026-05-10 16:31 UTC · model grok-4.3

classification 💻 cs.DS

keywords wavelet forestsselect queriesrank querieswavelet treescompressed data structuresfixed-block partitioningsequence indexes

0 comments

The pith

Wavelet forests add select query support with little extra space and stay practically efficient.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends wavelet forests to support select queries in addition to rank. Wavelet forests partition an input sequence into fixed-size blocks and assign a separate wavelet tree to each block. The extension adds only modest space overhead, and experiments across repetitive and non-repetitive inputs show the resulting structures are competitive with or faster than conventional full wavelet trees. The work also measures how superblock size and navigational data choices influence select performance. Readers care because rank and select are core primitives in many compressed sequence indexes.

Core claim

Wavelet forests partition the input sequence into fixed-size blocks and build a separate wavelet tree for each block. We extend this structure to support select queries. Select support can be added with little additional space overhead and the resulting structures remain practically efficient. In experiments on a range of non-repetitive and repetitive inputs, wavelet forests are competitive with, and in most cases outperform, standalone wavelet-tree implementations. We also study the effect of internal parameters, including superblock size and navigational data, on select-query performance.

What carries the argument

Fixed-block partitioning in wavelet forests, which creates one independent wavelet tree per block of the input sequence.

If this is right

Wavelet forests now support both rank and select with only small space growth over rank-only versions.
The block-based structures often run faster than full wavelet trees on the tested repetitive and non-repetitive inputs.
Select performance can be adjusted by changing superblock size and the type of navigational data stored.
The same partitioning approach works across both repetitive and non-repetitive sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These structures could replace full wavelet trees inside larger compressed text indexes to lower memory use while preserving query speed.
Systematic tuning of superblock size for a given input type may further improve select times beyond the manual settings tested.
The fixed-block idea might extend to additional sequence operations if similar lightweight navigational aids can be designed.

Load-bearing premise

The fixed-block partitioning and navigational data choices that worked for rank queries transfer to select queries without hidden performance cliffs on inputs outside the tested set.

What would settle it

A timing and space measurement of select queries on a large input sequence whose repetitiveness or alphabet size differs markedly from the paper's test sets, checking whether overhead stays under roughly five percent or query speed remains competitive.

read the original abstract

Rank and select queries are basic operations on sequences, with applications in compressed text indexes and other space-efficient data structures. One of the standard data structures supporting these queries is the wavelet tree. In this paper, we study wavelet forests, that is, wavelet-tree structures based on the fixed-block compression boosting technique. Such structures partition the input sequence into fixed-size blocks and build a separate wavelet tree for each block. Previous work showed that this approach yields strong practical performance for rank queries. We extend wavelet forests to support select queries. We show that select support can be added with little additional space overhead and that the resulting structures remain practically efficient. In experiments on a range of non-repetitive and repetitive inputs, wavelet forests are competitive with, and in most cases outperform, standalone wavelet-tree implementations. We also study the effect of internal parameters, including superblock size and navigational data, on select-query performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Wavelet forests get select support added with small space cost and stay competitive in the reported experiments.

read the letter

The main point is that wavelet forests can be extended to support select queries with only minor extra space, and the resulting structures hold up well against standard wavelet trees on the inputs tested. This is a direct follow-on to earlier work that used fixed-block boosting for rank queries. The authors adapt the partitioning and add navigational data so that select becomes possible without much overhead. They then run comparisons on both repetitive and non-repetitive sequences and check how superblock size and navigational choices affect speed. In most cases the forests match or beat plain wavelet trees. Rank and select sit at the core of compressed indexes used in text search and bioinformatics, so a practical variant that keeps space low and speed reasonable is worth knowing about. The experiments give concrete numbers on a range of inputs and include a parameter study, which helps readers see where the approach works. The soft spots are limited. Performance still depends on the specific inputs and tuning choices, so there may be cases outside the tested set where the gains shrink or disappear. The abstract does not spell out every implementation detail or show error bars, which makes it harder to judge how stable the wins are. Nothing in the claims looks internally inconsistent or circular, though. This paper is aimed at people who build or tune compressed sequence indexes. It shows clear, incremental thinking on an established technique and backs the extension with experiments rather than just theory. It deserves a serious referee because the contribution is testable and the evidence is presented directly. I would send it out for peer review.

Referee Report

1 major / 2 minor

Summary. The paper extends wavelet forests—wavelet-tree structures that partition the input sequence into fixed-size blocks and build a separate wavelet tree per block—to support select queries in addition to rank. It claims that select support can be added with little additional space overhead, that the resulting structures remain practically efficient, and that they are competitive with or outperform standalone wavelet-tree implementations on a range of non-repetitive and repetitive inputs. The work also examines the effects of internal parameters such as superblock size and navigational data on select-query performance.

Significance. If the experimental claims hold, the result is a practical improvement to a standard building block for compressed text indexes and other space-efficient sequence data structures. The focus on adding select support without substantial overhead, combined with empirical evaluation across input classes, addresses a useful gap between theoretical wavelet trees and deployable implementations.

major comments (1)

Experimental section: the central performance claim (select support added with little overhead while remaining competitive) rests on experiments whose full methodology, error bars, implementation details, and per-query space/time breakdowns are not supplied in sufficient detail to allow independent verification or assessment of robustness outside the tested inputs.

minor comments (2)

Abstract: the description of the experimental inputs could be more precise (e.g., naming the concrete data sets or their characteristics) so readers can immediately gauge the scope of the competitiveness claim.
Notation: ensure that the space-overhead formulas for the added select structures are stated explicitly and contrasted with the rank-only case, preferably with a table or equation reference.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address the single major comment below.

read point-by-point responses

Referee: Experimental section: the central performance claim (select support added with little overhead while remaining competitive) rests on experiments whose full methodology, error bars, implementation details, and per-query space/time breakdowns are not supplied in sufficient detail to allow independent verification or assessment of robustness outside the tested inputs.

Authors: We agree that greater detail in the experimental section would improve reproducibility and allow readers to better assess robustness. In the revised manuscript we will expand the section to include: (1) a more complete description of the implementation, covering the exact representation of navigational data, the select-support structures, and the block/superblock layout; (2) hardware and software environment details together with the number of repetitions performed for each timing measurement; (3) error bars or standard deviations on all reported running times and space figures; and (4) additional per-query space and time breakdowns for the dominant operations. These additions will directly address the concerns raised while preserving the existing experimental narrative. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper extends wavelet forests to support select queries by adding navigational structures and reports that this incurs little space overhead while preserving practical efficiency. These claims rest on new experimental benchmarks across non-repetitive and repetitive inputs rather than on any closed-form derivation, fitted parameter, or self-citation chain that reduces the result to its own inputs by construction. Space-overhead formulas and parameter choices (superblock size, navigational data) are presented as design decisions whose performance impact is measured directly, not predicted from previously fitted quantities. No load-bearing self-citation or uniqueness theorem is invoked to force the outcome; the central result is therefore self-contained and externally falsifiable via re-implementation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work relies on standard assumptions of sequence data structures and compressed indexes; no new free parameters, axioms, or invented entities are introduced beyond the choice of block sizes already studied in prior work.

pith-pipeline@v0.9.0 · 5438 in / 947 out tokens · 37847 ms · 2026-05-10T16:31:22.507558+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

URL:https://commoncrawl.org

1Common crawl. URL:https://commoncrawl.org. 2 Maxim A. Babenko, Pawel Gawrychowski, Tomasz Kociumaka, and Tatiana Starikovskaya. Wavelet trees meet suffix trees. In Piotr Indyk, editor,Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), pages 572–591. SIAM,

work page 2015
[2]

doi:10.1137/1.9781611973730.39. E. Chiu and D. Kempa 9 Figure 4Effect of superblock size on the select-query performance of wavelet forests. Time is measured in microseconds, and space is expressed as a percentage of the original text size. 3 Michael Burrows. A block-sorting lossless data compression algorithm.SRS Research Report, 124,

work page doi:10.1137/1.9781611973730.39
[3]

Faster wavelet tree queries

4 Matteo Ceregini, Florian Kurpicz, and Rossano Venturini. Faster wavelet tree queries. In Ali Bilgin, James E. Fowler, Joan Serra-Sagristà, Yan Ye, and James A. Storer, editors, Proceedings of the 2024 Data Compression Conference (DCC 2024), pages 223–232. IEEE, 2024.doi:10.1109/DCC58796.2024.00030. 5 Eric Chiu and Dominik Kempa. Fast select queries usin...

work page doi:10.1109/dcc58796.2024.00030 2024
[4]

7 Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel

doi: 10.1016/J.IS.2014.06.002. 7 Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel. Practical wavelet tree construction.ACM Journal of Experimental Algorithmics, 26:1.8:1–1.8:67,

work page doi:10.1016/j.is.2014.06.002 2014
[5]

8 Patrick Dinklage, Johannes Fischer, and Florian Kurpicz

doi:10.1145/3457197. 8 Patrick Dinklage, Johannes Fischer, and Florian Kurpicz. Constructing the wavelet tree and wavelet matrix in distributed memory. In Guy E. Blelloch and Irene Finocchi, editors, Proceedings of the 2020 Symposium on Algorithm Engineering and Experiments (ALENEX 2020), pages 214–228. SIAM, 2020.doi:10.1137/1.9781611976007.17. 9 Patrick...

work page doi:10.1145/3457197 2020
[6]

15 Travis Gagie, Gonzalo Navarro, and Simon J

doi:10.1145/3375890. 15 Travis Gagie, Gonzalo Navarro, and Simon J. Puglisi. New algorithms on wavelet trees and applications to information retrieval.Theoretical Computer Science, 426:25–41,

work page doi:10.1145/3375890
[7]

16 Simon Gog.Compressed suffix trees: design, construction, and applications

doi:10.1016/J.TCS.2011.12.002. 16 Simon Gog.Compressed suffix trees: design, construction, and applications. PhD thesis, University of Ulm,

work page doi:10.1016/j.tcs.2011.12.002 2011
[8]

From theory to practice: Plug and play with succinct data structures

17 Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In Joachim Gudmundsson and Jyrki Katajainen, editors,Proceedings of the 13th International Symposium on Experimental Algorithms (SEA 2014), pages 326–337. Springer, 2014.doi:10.1007/978-3-319-07959-2\_28. 18 Simon Gog, Juha...

work page doi:10.1007/978-3-319-07959-2 2014
[9]

Finding topological subgraphs

doi:10.1007/978-1-4939-2864-4\_642. 20 Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. InProceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pages 841–850. ACM/SIAM,

work page doi:10.1007/978-1-4939-2864-4 2003
[10]

21 Roberto Grossi and Jeffrey Scott Vitter

URL:http://dl.acm.org/citation.cfm? id=644108.644250. 21 Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching.SIAM Journal on Computing, 35(2):378– 407, 2005.doi:10.1137/S0097539702402354. 22 Roberto Grossi, Jeffrey Scott Vitter, and Bojian Xu. Wavelet trees: From theory ...

work page doi:10.1137/s0097539702402354 2005
[11]

29 Veli Mäkinen and Gonzalo Navarro

doi: 10.1016/J.JDA.2017.04.001. 29 Veli Mäkinen and Gonzalo Navarro. Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing, 12(1):40–66,

work page doi:10.1016/j.jda.2017.04.001 2017
[12]

Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter

30 J. Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter. Fast construction of wavelet trees. Theoretical Computer Science, 638:91–97, 2016.doi:10.1016/J.TCS.2015.11.011. 31 Gonzalo Navarro. Wavelet trees for all.Journal of Discrete Algorithms, 25:2–20,

work page doi:10.1016/j.tcs.2015.11.011 2016
[13]

32 Gonzalo Navarro.Compact data structures: A practical approach

doi:10.1016/J.JDA.2013.07.004. 32 Gonzalo Navarro.Compact data structures: A practical approach. Cambridge University Press, Cambridge, UK, 2016.doi:10.1017/cbo9781316588284. 33 Project Gutenberg. Project Gutenberg. Accessed: 2025-11-03. URL:https://www.gutenberg. org/. 34 Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictiona...

work page doi:10.1016/j.jda.2013.07.004 2013

[1] [1]

URL:https://commoncrawl.org

1Common crawl. URL:https://commoncrawl.org. 2 Maxim A. Babenko, Pawel Gawrychowski, Tomasz Kociumaka, and Tatiana Starikovskaya. Wavelet trees meet suffix trees. In Piotr Indyk, editor,Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), pages 572–591. SIAM,

work page 2015

[2] [2]

doi:10.1137/1.9781611973730.39. E. Chiu and D. Kempa 9 Figure 4Effect of superblock size on the select-query performance of wavelet forests. Time is measured in microseconds, and space is expressed as a percentage of the original text size. 3 Michael Burrows. A block-sorting lossless data compression algorithm.SRS Research Report, 124,

work page doi:10.1137/1.9781611973730.39

[3] [3]

Faster wavelet tree queries

4 Matteo Ceregini, Florian Kurpicz, and Rossano Venturini. Faster wavelet tree queries. In Ali Bilgin, James E. Fowler, Joan Serra-Sagristà, Yan Ye, and James A. Storer, editors, Proceedings of the 2024 Data Compression Conference (DCC 2024), pages 223–232. IEEE, 2024.doi:10.1109/DCC58796.2024.00030. 5 Eric Chiu and Dominik Kempa. Fast select queries usin...

work page doi:10.1109/dcc58796.2024.00030 2024

[4] [4]

7 Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel

doi: 10.1016/J.IS.2014.06.002. 7 Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel. Practical wavelet tree construction.ACM Journal of Experimental Algorithmics, 26:1.8:1–1.8:67,

work page doi:10.1016/j.is.2014.06.002 2014

[5] [5]

8 Patrick Dinklage, Johannes Fischer, and Florian Kurpicz

doi:10.1145/3457197. 8 Patrick Dinklage, Johannes Fischer, and Florian Kurpicz. Constructing the wavelet tree and wavelet matrix in distributed memory. In Guy E. Blelloch and Irene Finocchi, editors, Proceedings of the 2020 Symposium on Algorithm Engineering and Experiments (ALENEX 2020), pages 214–228. SIAM, 2020.doi:10.1137/1.9781611976007.17. 9 Patrick...

work page doi:10.1145/3457197 2020

[6] [6]

15 Travis Gagie, Gonzalo Navarro, and Simon J

doi:10.1145/3375890. 15 Travis Gagie, Gonzalo Navarro, and Simon J. Puglisi. New algorithms on wavelet trees and applications to information retrieval.Theoretical Computer Science, 426:25–41,

work page doi:10.1145/3375890

[7] [7]

16 Simon Gog.Compressed suffix trees: design, construction, and applications

doi:10.1016/J.TCS.2011.12.002. 16 Simon Gog.Compressed suffix trees: design, construction, and applications. PhD thesis, University of Ulm,

work page doi:10.1016/j.tcs.2011.12.002 2011

[8] [8]

From theory to practice: Plug and play with succinct data structures

17 Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In Joachim Gudmundsson and Jyrki Katajainen, editors,Proceedings of the 13th International Symposium on Experimental Algorithms (SEA 2014), pages 326–337. Springer, 2014.doi:10.1007/978-3-319-07959-2\_28. 18 Simon Gog, Juha...

work page doi:10.1007/978-3-319-07959-2 2014

[9] [9]

Finding topological subgraphs

doi:10.1007/978-1-4939-2864-4\_642. 20 Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. InProceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pages 841–850. ACM/SIAM,

work page doi:10.1007/978-1-4939-2864-4 2003

[10] [10]

21 Roberto Grossi and Jeffrey Scott Vitter

URL:http://dl.acm.org/citation.cfm? id=644108.644250. 21 Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching.SIAM Journal on Computing, 35(2):378– 407, 2005.doi:10.1137/S0097539702402354. 22 Roberto Grossi, Jeffrey Scott Vitter, and Bojian Xu. Wavelet trees: From theory ...

work page doi:10.1137/s0097539702402354 2005

[11] [11]

29 Veli Mäkinen and Gonzalo Navarro

doi: 10.1016/J.JDA.2017.04.001. 29 Veli Mäkinen and Gonzalo Navarro. Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing, 12(1):40–66,

work page doi:10.1016/j.jda.2017.04.001 2017

[12] [12]

Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter

30 J. Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter. Fast construction of wavelet trees. Theoretical Computer Science, 638:91–97, 2016.doi:10.1016/J.TCS.2015.11.011. 31 Gonzalo Navarro. Wavelet trees for all.Journal of Discrete Algorithms, 25:2–20,

work page doi:10.1016/j.tcs.2015.11.011 2016

[13] [13]

32 Gonzalo Navarro.Compact data structures: A practical approach

doi:10.1016/J.JDA.2013.07.004. 32 Gonzalo Navarro.Compact data structures: A practical approach. Cambridge University Press, Cambridge, UK, 2016.doi:10.1017/cbo9781316588284. 33 Project Gutenberg. Project Gutenberg. Accessed: 2025-11-03. URL:https://www.gutenberg. org/. 34 Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictiona...

work page doi:10.1016/j.jda.2013.07.004 2013