pith. sign in

arxiv: 2604.11338 · v1 · submitted 2026-04-13 · 💻 cs.DS

Wavelet Forests Revisited

Pith reviewed 2026-05-10 16:31 UTC · model grok-4.3

classification 💻 cs.DS
keywords wavelet forestsselect queriesrank querieswavelet treescompressed data structuresfixed-block partitioningsequence indexes
0
0 comments X

The pith

Wavelet forests add select query support with little extra space and stay practically efficient.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends wavelet forests to support select queries in addition to rank. Wavelet forests partition an input sequence into fixed-size blocks and assign a separate wavelet tree to each block. The extension adds only modest space overhead, and experiments across repetitive and non-repetitive inputs show the resulting structures are competitive with or faster than conventional full wavelet trees. The work also measures how superblock size and navigational data choices influence select performance. Readers care because rank and select are core primitives in many compressed sequence indexes.

Core claim

Wavelet forests partition the input sequence into fixed-size blocks and build a separate wavelet tree for each block. We extend this structure to support select queries. Select support can be added with little additional space overhead and the resulting structures remain practically efficient. In experiments on a range of non-repetitive and repetitive inputs, wavelet forests are competitive with, and in most cases outperform, standalone wavelet-tree implementations. We also study the effect of internal parameters, including superblock size and navigational data, on select-query performance.

What carries the argument

Fixed-block partitioning in wavelet forests, which creates one independent wavelet tree per block of the input sequence.

If this is right

  • Wavelet forests now support both rank and select with only small space growth over rank-only versions.
  • The block-based structures often run faster than full wavelet trees on the tested repetitive and non-repetitive inputs.
  • Select performance can be adjusted by changing superblock size and the type of navigational data stored.
  • The same partitioning approach works across both repetitive and non-repetitive sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These structures could replace full wavelet trees inside larger compressed text indexes to lower memory use while preserving query speed.
  • Systematic tuning of superblock size for a given input type may further improve select times beyond the manual settings tested.
  • The fixed-block idea might extend to additional sequence operations if similar lightweight navigational aids can be designed.

Load-bearing premise

The fixed-block partitioning and navigational data choices that worked for rank queries transfer to select queries without hidden performance cliffs on inputs outside the tested set.

What would settle it

A timing and space measurement of select queries on a large input sequence whose repetitiveness or alphabet size differs markedly from the paper's test sets, checking whether overhead stays under roughly five percent or query speed remains competitive.

read the original abstract

Rank and select queries are basic operations on sequences, with applications in compressed text indexes and other space-efficient data structures. One of the standard data structures supporting these queries is the wavelet tree. In this paper, we study wavelet forests, that is, wavelet-tree structures based on the fixed-block compression boosting technique. Such structures partition the input sequence into fixed-size blocks and build a separate wavelet tree for each block. Previous work showed that this approach yields strong practical performance for rank queries. We extend wavelet forests to support select queries. We show that select support can be added with little additional space overhead and that the resulting structures remain practically efficient. In experiments on a range of non-repetitive and repetitive inputs, wavelet forests are competitive with, and in most cases outperform, standalone wavelet-tree implementations. We also study the effect of internal parameters, including superblock size and navigational data, on select-query performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper extends wavelet forests—wavelet-tree structures that partition the input sequence into fixed-size blocks and build a separate wavelet tree per block—to support select queries in addition to rank. It claims that select support can be added with little additional space overhead, that the resulting structures remain practically efficient, and that they are competitive with or outperform standalone wavelet-tree implementations on a range of non-repetitive and repetitive inputs. The work also examines the effects of internal parameters such as superblock size and navigational data on select-query performance.

Significance. If the experimental claims hold, the result is a practical improvement to a standard building block for compressed text indexes and other space-efficient sequence data structures. The focus on adding select support without substantial overhead, combined with empirical evaluation across input classes, addresses a useful gap between theoretical wavelet trees and deployable implementations.

major comments (1)
  1. Experimental section: the central performance claim (select support added with little overhead while remaining competitive) rests on experiments whose full methodology, error bars, implementation details, and per-query space/time breakdowns are not supplied in sufficient detail to allow independent verification or assessment of robustness outside the tested inputs.
minor comments (2)
  1. Abstract: the description of the experimental inputs could be more precise (e.g., naming the concrete data sets or their characteristics) so readers can immediately gauge the scope of the competitiveness claim.
  2. Notation: ensure that the space-overhead formulas for the added select structures are stated explicitly and contrasted with the rank-only case, preferably with a table or equation reference.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: Experimental section: the central performance claim (select support added with little overhead while remaining competitive) rests on experiments whose full methodology, error bars, implementation details, and per-query space/time breakdowns are not supplied in sufficient detail to allow independent verification or assessment of robustness outside the tested inputs.

    Authors: We agree that greater detail in the experimental section would improve reproducibility and allow readers to better assess robustness. In the revised manuscript we will expand the section to include: (1) a more complete description of the implementation, covering the exact representation of navigational data, the select-support structures, and the block/superblock layout; (2) hardware and software environment details together with the number of repetitions performed for each timing measurement; (3) error bars or standard deviations on all reported running times and space figures; and (4) additional per-query space and time breakdowns for the dominant operations. These additions will directly address the concerns raised while preserving the existing experimental narrative. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper extends wavelet forests to support select queries by adding navigational structures and reports that this incurs little space overhead while preserving practical efficiency. These claims rest on new experimental benchmarks across non-repetitive and repetitive inputs rather than on any closed-form derivation, fitted parameter, or self-citation chain that reduces the result to its own inputs by construction. Space-overhead formulas and parameter choices (superblock size, navigational data) are presented as design decisions whose performance impact is measured directly, not predicted from previously fitted quantities. No load-bearing self-citation or uniqueness theorem is invoked to force the outcome; the central result is therefore self-contained and externally falsifiable via re-implementation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work relies on standard assumptions of sequence data structures and compressed indexes; no new free parameters, axioms, or invented entities are introduced beyond the choice of block sizes already studied in prior work.

pith-pipeline@v0.9.0 · 5438 in / 947 out tokens · 37847 ms · 2026-05-10T16:31:22.507558+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    URL:https://commoncrawl.org

    1Common crawl. URL:https://commoncrawl.org. 2 Maxim A. Babenko, Pawel Gawrychowski, Tomasz Kociumaka, and Tatiana Starikovskaya. Wavelet trees meet suffix trees. In Piotr Indyk, editor,Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2015), pages 572–591. SIAM,

  2. [2]

    doi:10.1137/1.9781611973730.39. E. Chiu and D. Kempa 9 Figure 4Effect of superblock size on the select-query performance of wavelet forests. Time is measured in microseconds, and space is expressed as a percentage of the original text size. 3 Michael Burrows. A block-sorting lossless data compression algorithm.SRS Research Report, 124,

  3. [3]

    Faster wavelet tree queries

    4 Matteo Ceregini, Florian Kurpicz, and Rossano Venturini. Faster wavelet tree queries. In Ali Bilgin, James E. Fowler, Joan Serra-Sagristà, Yan Ye, and James A. Storer, editors, Proceedings of the 2024 Data Compression Conference (DCC 2024), pages 223–232. IEEE, 2024.doi:10.1109/DCC58796.2024.00030. 5 Eric Chiu and Dominik Kempa. Fast select queries usin...

  4. [4]

    7 Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel

    doi: 10.1016/J.IS.2014.06.002. 7 Patrick Dinklage, Jonas Ellert, Johannes Fischer, Florian Kurpicz, and Marvin Löbel. Practical wavelet tree construction.ACM Journal of Experimental Algorithmics, 26:1.8:1–1.8:67,

  5. [5]

    8 Patrick Dinklage, Johannes Fischer, and Florian Kurpicz

    doi:10.1145/3457197. 8 Patrick Dinklage, Johannes Fischer, and Florian Kurpicz. Constructing the wavelet tree and wavelet matrix in distributed memory. In Guy E. Blelloch and Irene Finocchi, editors, Proceedings of the 2020 Symposium on Algorithm Engineering and Experiments (ALENEX 2020), pages 214–228. SIAM, 2020.doi:10.1137/1.9781611976007.17. 9 Patrick...

  6. [6]

    15 Travis Gagie, Gonzalo Navarro, and Simon J

    doi:10.1145/3375890. 15 Travis Gagie, Gonzalo Navarro, and Simon J. Puglisi. New algorithms on wavelet trees and applications to information retrieval.Theoretical Computer Science, 426:25–41,

  7. [7]

    16 Simon Gog.Compressed suffix trees: design, construction, and applications

    doi:10.1016/J.TCS.2011.12.002. 16 Simon Gog.Compressed suffix trees: design, construction, and applications. PhD thesis, University of Ulm,

  8. [8]

    From theory to practice: Plug and play with succinct data structures

    17 Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. From theory to practice: Plug and play with succinct data structures. In Joachim Gudmundsson and Jyrki Katajainen, editors,Proceedings of the 13th International Symposium on Experimental Algorithms (SEA 2014), pages 326–337. Springer, 2014.doi:10.1007/978-3-319-07959-2\_28. 18 Simon Gog, Juha...

  9. [9]

    Finding topological subgraphs

    doi:10.1007/978-1-4939-2864-4\_642. 20 Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. InProceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), pages 841–850. ACM/SIAM,

  10. [10]

    21 Roberto Grossi and Jeffrey Scott Vitter

    URL:http://dl.acm.org/citation.cfm? id=644108.644250. 21 Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching.SIAM Journal on Computing, 35(2):378– 407, 2005.doi:10.1137/S0097539702402354. 22 Roberto Grossi, Jeffrey Scott Vitter, and Bojian Xu. Wavelet trees: From theory ...

  11. [11]

    29 Veli Mäkinen and Gonzalo Navarro

    doi: 10.1016/J.JDA.2017.04.001. 29 Veli Mäkinen and Gonzalo Navarro. Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing, 12(1):40–66,

  12. [12]

    Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter

    30 J. Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter. Fast construction of wavelet trees. Theoretical Computer Science, 638:91–97, 2016.doi:10.1016/J.TCS.2015.11.011. 31 Gonzalo Navarro. Wavelet trees for all.Journal of Discrete Algorithms, 25:2–20,

  13. [13]

    32 Gonzalo Navarro.Compact data structures: A practical approach

    doi:10.1016/J.JDA.2013.07.004. 32 Gonzalo Navarro.Compact data structures: A practical approach. Cambridge University Press, Cambridge, UK, 2016.doi:10.1017/cbo9781316588284. 33 Project Gutenberg. Project Gutenberg. Accessed: 2025-11-03. URL:https://www.gutenberg. org/. 34 Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictiona...