pith. sign in

arxiv: 2602.10864 · v2 · submitted 2026-02-11 · 💻 cs.DS

Random Access in Grammar-Compressed Strings: Optimal Trade-Offs in Almost All Parameter Regimes

Pith reviewed 2026-05-16 05:54 UTC · model grok-4.3

classification 💻 cs.DS
keywords grammar compressionrandom accessspace-time trade-offstraight-line grammarlower boundsdata structuresstring algorithms
0
0 comments X

The pith

For any chosen size M between grammar and string, random access to a grammar-compressed string takes time O(log(n log sigma / M w) / log(M w / g log n)).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a space-time trade-off for random access on strings given by straight-line grammars of size g. For any data structure size M satisfying g log n < M w < n log sigma, it constructs an O(M)-space structure whose query time is the logarithm of the ratio between full string size and effective compressed size, divided by the logarithm of how much M exceeds the grammar. This matters because many strings in practice are compressible, so a modest increase in space above the grammar can yield substantially faster access without expanding the whole string. The result also supplies a matching unconditional lower bound that holds across almost all regimes of grammar size, alphabet size, and data structure size.

Core claim

For any M satisfying g log n < M w < n log sigma, an O(M)-size data structure supports random access queries in O(log(n log sigma / M w) / log(M w / g log n)) time on a string of length n given by a straight-line grammar of size g over alphabet size sigma in the word RAM with word size w = Omega(log n). This bound is matched by an unconditional lower bound in all regimes except very small grammars and relatively small data structures.

What carries the argument

Novel grammar transformations that generalize contracting grammars, which produce a hierarchical representation of the string allowing balanced navigation to any position i.

Load-bearing premise

The input is given as a straight-line grammar and the computation occurs in the word RAM model with word size at least logarithmic in n.

What would settle it

A concrete falsifier is an explicit straight-line grammar of size g together with an M in the stated range for which either no O(M)-space structure achieves the claimed query time or a lower-bound argument shows that the stated time is not necessary.

read the original abstract

A Random Access query to a string $T\in [0..\sigma)^n$ asks for the character $T[i]$ at a given position $i\in [0..n)$. In $O(n\log\sigma)$ bits of space, this fundamental task admits constant-time queries. While this is optimal in the worst case, much research has focused on compressible strings, hoping for smaller data structures that still admit efficient queries. We investigate the grammar-compressed setting, where $T$ is represented by a straight-line grammar. Our main result is a general trade-off that optimizes Random Access time as a function of string length $n$, grammar size (the total length of productions) $g$, alphabet size $\sigma$, data structure size $M$, and word size $w=\Omega(\log n)$ of the word RAM model. For any $M$ with $g\log n<Mw<n\log\sigma$, we show an $O(M)$-size data structure with query time $O(\frac{\log(n\log\sigma\,/\,Mw)}{\log(Mw\,/\,g\log n)})$. Remarkably, we also prove a matching unconditional lower bound that holds for all parameter regimes except very small grammars and relatively small data structures. Previous work focused on query time as a function of $n$ only, achieving $O(\log n)$ time using $O(g)$ space [Bille et al.; SIAM J. Comput. 2015] and $O(\frac{\log n}{\log \log n})$ time using $O(g\log^{\epsilon} n)$ space for any constant $\epsilon > 0$ [Belazzougui et al.; ESA'15], [Ganardi, Je\.z, Lohrey; J. ACM 2021]. The only tight lower bound [Verbin and Yu; CPM'13] was $\Omega(\frac{\log n}{\log\log n})$ for $w=\Theta(\log n)$, $n^{\Omega(1)}\le g\le n^{1-\Omega(1)}$, and $M=g\log^{\Theta(1)}n$. In contrast, our result yields tight bounds in all relevant parameters and almost all regimes. Our data structure admits efficient deterministic construction. It relies on novel grammar transformations that generalize contracting grammars [Ganardi; ESA'21]. Beyond Random Access, its variants support substring extraction, rank, and select.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper claims a parameterized trade-off for random access on grammar-compressed strings: given a straight-line grammar of size g for a string of length n over alphabet size σ, for any M satisfying g log n < M w < n log σ (with w = Ω(log n) the word size), there is an O(M)-space data structure supporting random access in O(log(n log σ / M w) / log(M w / g log n)) time. It also proves a matching unconditional lower bound in all regimes except very small grammars and relatively small data structures. The construction is deterministic, relies on novel grammar transformations generalizing contracting grammars, and extends to substring extraction, rank, and select.

Significance. If the result holds, the work is significant for establishing tight upper and lower bounds on random access across almost all parameter regimes, improving on prior results such as O(log n) time with O(g) space and O(log n / log log n) time with O(g log^ε n) space. The unconditional matching lower bound (except explicit corner cases) and the deterministic construction via grammar transformations are particular strengths, as they provide a complete picture rather than isolated points in the trade-off space.

minor comments (1)
  1. [Abstract] Abstract: the lower-bound exclusions for very small g and M are mentioned but could be stated more explicitly (e.g., the precise thresholds) to make the scope of the matching bound immediately clear to readers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and encouraging report, which recognizes the significance of our parameterized trade-off for random access on grammar-compressed strings together with the matching unconditional lower bound. We are pleased that the deterministic construction via novel grammar transformations and the extension to substring extraction, rank, and select are viewed as strengths.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The central trade-off and lower bound are obtained via explicit grammar transformations (generalizing contracting grammars) and information-theoretic arguments on the word-RAM model. These steps are independent of the target result, use standard assumptions (w=Ω(log n), straight-line grammar of size g), and contain no self-definitional reductions, fitted-input renamings, or load-bearing self-citations. Prior citations are external and non-overlapping with the present authors. The result is therefore not equivalent to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on the standard word-RAM model and the definition of straight-line grammars; no new free parameters, ad-hoc axioms, or invented entities are introduced.

axioms (2)
  • domain assumption Word RAM model with word size w = Omega(log n)
    Standard computational model invoked for all time and space bounds.
  • domain assumption Input string given by a straight-line grammar of size g
    The grammar-compressed setting that defines the input size throughout the paper.

pith-pipeline@v0.9.0 · 5762 in / 1418 out tokens · 49545 ms · 2026-05-16T05:54:24.415306+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

  1. [1]

    Predecessor search with distance-sensitive query time

    [BBV12] Djamal Belazzougui, Paolo Boldi, and Sebastiano Vigna. Predecessor search with distance-sensitive query time, 2012.arXiv:1209.5441. [BCCG18] Philip Bille, Anders Roy Christiansen, Patrick Hagge Cording, and Inge Li Gørtz. Finger search in grammar-compressed strings.Theory of Computing Systems, 62(8):1715–1735, 2018.doi:10.1007/S00224-017-9839-9. [...

  2. [2]

    [BFH+24] Hideo Bannai, Mitsuru Funakoshi, Diptarama Hendrian, Myuji Matsuda, and Simon J. Puglisi. Height-bounded Lempel-Ziv encodings. In32nd Annual European Symposium on Algorithms (ESA 2024), volume 308, pages 18:1–18:18. Schloss Dagstuhl – Leibniz- Zentrum für Informatik, 2024.doi:10.4230/LIPICS.ESA.2024.18. 35 [BGKS14] Maxim Babenko, Paweł Gawrychows...

  3. [3]

    [CEK+21] Anders Roy Christiansen, Mikko Berggren Ettienne, Tomasz Kociumaka, Gonzalo Navarro, and Nicola Prezza

    URL:https://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf. [CEK+21] Anders Roy Christiansen, Mikko Berggren Ettienne, Tomasz Kociumaka, Gonzalo Navarro, and Nicola Prezza. Optimal-time dictionary-compressed indexes.ACM Trans. Algorithms, 17(1):8:1–8:39, 2021.doi:10.1145/3426473. [CGW16] Patrick Hagge Cording, Paweł Gawrychowski, and Oren Weimann. B...

  4. [4]

    [CLL+05] Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat

    arXiv:2602.04523. [CLL+05] Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. The smallest grammar problem.IEEE Transactions on Infor- mation Theory, 51(7):2554–2576, 2005.doi:10.1109/TIT.2005.850116. [DK24a] Rajat De and Dominik Kempa. Grammar boosting: A new technique for proving lower bounds for compu...

  5. [5]

    doi:10.1007/978-3-031-72200-4_8

    Springer-Verlag. doi:10.1007/978-3-031-72200-4_8. [DK26] Rajat De and Dominik Kempa.Optimal Random Access and Conditional Lower Bounds for 2D Compressed Strings, pages 1903–1915. Society for Industrial and Applied Math- ematics, January 2026.doi:10.1137/1.9781611978971.69. 36 [DLRR13] Akashnil Dutta, Reut Levi, Dana Ron, and Ronitt Rubinfeld. A simple onl...

  6. [6]

    csail.mit.edu/CSGArchives/memos/Memo-61.pdf

    URL:https://csg. csail.mit.edu/CSGArchives/memos/Memo-61.pdf. [FW93] Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees.Journal of Computer and System Sciences, 47(3):424–436, December 1993.doi:10.1016/0022-0000(93)90040-4. [Gan21] Moses Ganardi. Compression by Contracting Straight-Line Programs. In Petra M...

  7. [7]

    [GJL21] Moses Ganardi, Artur Jeż, and Markus Lohrey

    Schloss Dagstuhl – Leibniz- Zentrum für Informatik.doi:10.4230/LIPIcs.ESA.2021.45. [GJL21] Moses Ganardi, Artur Jeż, and Markus Lohrey. Balancing straight-line programs.Jour- nal of the ACM, 68(4):27:1–27:40, 2021.doi:10.1145/3457389. [GKPS05] Leszek Gasieniec, Roman M. Kolpakov, Igor Potapov, and Paul Sant. Real-time traversal in grammar-based compressed...

  8. [8]

    [GNP20] Travis Gagie, Gonzalo Navarro, and Nicola Prezza

    IEEE Computer Society, 2005.doi: 10.1109/DCC.2005.78. [GNP20] Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Fully functional suffix trees and optimal text searching in BWT-runs bounded space.Journal of the ACM, 67(1):2:1– 2:54, 2020.doi:10.1145/3375890. [Gro13] Roberto Grossi.Random Access to High-Order Entropy Compressed Text, pages 199–215. Springer...

  9. [9]

    [Lar12] Kasper Green Larsen

    doi:10.1109/18.841160. [Lar12] Kasper Green Larsen. Higher cell probe lower bounds for evaluating polynomials. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pages 293–301. IEEE Computer Society, 2012.doi:10.1109/FOCS.2012.21. [LMN24] Zsuzsanna Lipták, Francesco Masillo, and Gonzal...

  10. [10]

    [Pˇ at11] Mihai Pˇ atraşcu

    Springer Nature Switzerland.doi:10.1007/978-3-031-55598-5_5. [Pˇ at11] Mihai Pˇ atraşcu. Unifying the landscape of cell-probe lower bounds.SIAM Journal on Computing, 40(3):827–847, 2011.doi:10.1137/09075336X. [PT14] Mihai Pˇ atraşcu and Mikkel Thorup. Dynamic integer sets with optimal rank, select, and predecessor search. In2014 IEEE 55th Annual Symposium...

  11. [11]

    [RRS07] Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti

    doi:10.1007/S00453-012-9618-6. [RRS07] Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. Succinct indexable dictio- naries with applications to encodingk-ary trees, prefix sums and multisets.ACM Trans. Algorithms, 3(4):43, 2007.doi:10.1145/1290672.1290680. [Rub76] Frank Rubin. Experiments in text file compression.Communications of the ACM, 19(11):61...

  12. [12]

    [SG06] Kunihiko Sadakane and Roberto Grossi

    doi:10.1016/S0304-3975(02)00777-6. [SG06] Kunihiko Sadakane and Roberto Grossi. Squeezing succinct data structures into entropy bounds. InProceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 1230–1239. ACM Press,

  13. [13]

    InACM Symposium on Theory of Computing

    URL:http://dl.acm.org/citation.cfm?id=1109557.1109693. [SNYI25] Hiroki Shibata, Yuto Nakashima, Yutaro Yamaguchi, and Shunsuke Inenaga. LZBE: an LZ-style compressor supportingO(logn)-time random access, June 2025.arXiv: 2506.20107. 39 [SW19] SandipSinhaandOmriWeinstein. LocaldecodabilityoftheBurrows-Wheelertransform. InProceedings of the 51st Annual ACM S...

  14. [14]

    doi:10.1007/978-3-642-38905-4_24

    Springer Berlin Heidelberg. doi:10.1007/978-3-642-38905-4_24. [WY14] Yaoyu Wang and Yitong Yin.Certificates in Data Structures, pages 1039–1050. Springer Berlin Heidelberg, 2014.doi:10.1007/978-3-662-43948-7_86. [ZL77] Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337–34...