Ranked MSO-enumeration over compressed words
Pith reviewed 2026-06-28 07:51 UTC · model grok-4.3
The pith
Fixed MSO queries admit ranked enumeration with linear preprocessing and constant delay on grammar-compressed strings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
It is shown that the ranked query enumeration problem for a fixed MSO-query on strings can be solved with linear preprocessing and constant delay in the grammar-compressed setting, where the input string is given by a straight-line program. Ranked means that the output tuples are printed in a specific order that must itself be MSO-definable. This yields the first constant-delay result for ranked enumeration on compressed data. As a corollary, for any fixed polyregular function f and any word w given by a straight-line program of size n, the symbols of f(w) can be listed from left to right with constant delay after O(n) preprocessing.
What carries the argument
Factorization trees lifted to straight-line programs, which support constant-delay traversal while respecting the MSO-definable output order.
If this is right
- Any fixed MSO query on an SLP-compressed word can be enumerated in MSO-definable order after linear preprocessing.
- Symbols of the image of a compressed word under a fixed polyregular function can be listed left-to-right with constant delay.
- The constant-delay guarantee holds only when the required output order is MSO-definable.
- The same preprocessing supports multiple fixed queries on the same compressed word.
Where Pith is reading between the lines
- The result suggests that highly repetitive inputs (DNA, version histories) could support fast MSO querying without full decompression.
- If factorization trees can be built for other compression formats such as Lempel-Ziv, similar constant-delay guarantees might transfer.
- The approach separates the cost of handling repetition from the cost of enforcing a particular output order.
Load-bearing premise
The output order required by the enumeration task must itself be expressible by a formula of monadic second-order logic.
What would settle it
An explicit MSO query, straight-line program, and MSO-definable order such that every algorithm with only linear preprocessing produces some output tuple after a delay that grows with the size of the compressed input.
read the original abstract
It is shown that the ranked query enumeration problem for a fixed MSO-query on strings can be solved with linear preprocessing and constant delay in the grammar-compressed setting, where the input string is given by a so-called straight-line program, i.e., a context-free grammar that produces exactly one string. Moreover, `ranked' means that the output tuples of the MSO-query are printed in a specific order that has to be MSO-definable. This is the first result for ranked query enumeration on compressed data. A corollary of this result is that for a fixed polyregular function $f$ and a word $w$ that is given by a straight-line program of size $n$, one can list after preprocessing time $\mathcal{O}(n)$ the symbols in $f(w)$ from left to right with constant delay, which generalizes a result of Bojanczyk for the case where $w$ is uncompressed. The proofs for these results are based on factorization trees, which are made accessible to the grammar-compressed setting (a contribution of independent interest).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript shows that for any fixed MSO formula defining a query and an MSO-definable linear order on its output tuples, ranked enumeration over an SLP-compressed string of size n can be performed after O(n) preprocessing with constant delay per tuple. A corollary states that the symbols of any fixed polyregular function f applied to an SLP-compressed word w can be enumerated left-to-right with O(n) preprocessing and constant delay, generalizing an earlier result of Bojańczyk for uncompressed inputs. The proofs rely on an adaptation of factorization trees to the grammar-compressed setting, presented as a contribution of independent interest.
Significance. If the central construction is correct, the result is significant: it supplies the first constant-delay ranked MSO enumeration algorithm that works directly on SLP-compressed strings rather than requiring decompression. The factorization-tree adaptation is explicitly isolated as reusable, which strengthens the contribution beyond the enumeration application. The polyregular-function corollary is a clean and useful consequence.
major comments (2)
- [§3] §3 (Factorization trees in the compressed setting): the manuscript must explicitly verify that the height and branching of the adapted factorization tree remain logarithmic in the uncompressed length while the construction itself runs in time linear in the SLP size; otherwise the claimed O(n) preprocessing bound does not follow.
- [Theorem 4.2] Theorem 4.2 (ranked enumeration): the reduction from the MSO query to the ranked enumeration problem on the factorization tree must be shown to preserve the MSO-definable order without introducing an extra logarithmic factor in the delay; the current abstract statement leaves this step implicit.
minor comments (2)
- The notation for the SLP size versus the uncompressed length should be introduced once and used consistently (currently both n and |w| appear without explicit relation).
- Add a short paragraph comparing the new constant-delay bound with the best known non-ranked enumeration results on SLPs (e.g., those based on Courcelle’s theorem).
Simulated Author's Rebuttal
We thank the referee for the detailed review and for recognizing the significance of our results on ranked MSO enumeration over compressed strings. We address each major comment below and will revise the manuscript accordingly to incorporate the requested clarifications.
read point-by-point responses
-
Referee: [§3] §3 (Factorization trees in the compressed setting): the manuscript must explicitly verify that the height and branching of the adapted factorization tree remain logarithmic in the uncompressed length while the construction itself runs in time linear in the SLP size; otherwise the claimed O(n) preprocessing bound does not follow.
Authors: We agree that an explicit verification is necessary to rigorously support the preprocessing bound. In Section 3, the adaptation is presented, but we will add a new lemma (Lemma 3.X) in the revised manuscript that formally proves: (i) the height of the factorization tree is O(log N) for uncompressed length N, (ii) the branching factor is constant, and (iii) the construction from the SLP of size n takes O(n) time. This lemma will be used to establish the O(n) preprocessing for the enumeration algorithm. revision: yes
-
Referee: [Theorem 4.2] Theorem 4.2 (ranked enumeration): the reduction from the MSO query to the ranked enumeration problem on the factorization tree must be shown to preserve the MSO-definable order without introducing an extra logarithmic factor in the delay; the current abstract statement leaves this step implicit.
Authors: We acknowledge that the proof of Theorem 4.2 leaves the details of the reduction somewhat implicit. In the revised version, we will expand the proof to explicitly show that the MSO-definable order on tuples is preserved under the mapping to the factorization tree (because the tree respects the linear order of positions in the string), and that the constant-delay enumeration on the tree does not introduce an extra logarithmic factor in the per-tuple delay. We will include a separate claim or sublemma for this preservation and delay analysis. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents an algorithmic construction for ranked MSO enumeration on SLP-compressed strings using factorization trees adapted to the compressed setting. No load-bearing step reduces by definition, fitted parameter, or self-citation chain to the input claim itself. The result is framed as a new technical contribution with independent interest, relying on existing MSO and SLP concepts without internal circular reductions or renaming of known results as derivations.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math MSO logic on strings is well-defined and decidable for fixed queries
- domain assumption Straight-line programs correctly represent the input string
Reference graph
Works this paper leans on
-
[1]
A Factorization Theorem for Forest Algebras
URL:https://arxiv.org/abs/2605.10368,arXiv:2605.10368. 26 Ranked MSO-enumeration over compressed words 2 Antoine Amarilli, Pierre Bourhis, Florent Capelli, and Mikaël Monet. Ranked enumeration for MSO on trees via knowledge compilation. InProceedings of the 27th International Conference on Database Theory, ICDT 2024, LIPIcs, pages 25:1–25:18. Schloss Dags...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.4230/lipics.icdt.2024.25 2024
-
[2]
and the 15th Annual Conference of the EACSL, volume 4207 ofLecture Notes in Computer Science, pages 167–181. Springer, 2006.doi:10.1007/11874683\_11. 6 Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random access to grammar-compressed strings and trees.SIAM Journal on Computing, 44(3):513–539, 2015.doi...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/11874683 2006
-
[3]
URL:http://arxiv. org/abs/1810.08760,arXiv:1810.08760. 9 Mikolaj Bojanczyk, Sandra Kiefer, and Nathan Lhote. String-to-string interpretations with polynomial-size output. InProceedings of the 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, LIPIcs, pages 106:1–106:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 201...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.4230/lipics.icalp.2019.106 2019
-
[4]
Xpath evaluation in linear time
11 Mikolaj Bojanczyk and Pawel Parys. Xpath evaluation in linear time. InProceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008, pages 241–250. ACM, 2008.doi:10.1145/1376916.1376951. 12 Pierre Bourhis, Alejandro Grez, Louis Jachiet, and Cristian Riveros. Ranked enumeration of MSO logic on words. InProceedin...
-
[5]
doi:10.1007/s00224-019-09942-y. 18 Leszek Gasieniec, Roman M. Kolpakov, Igor Potapov, and Paul Sant. Real-time traversal in grammar-based compressed files. InProceedings to the 2005 Data Compression Conference, DCC 2005, page
-
[6]
IEEE Computer Society, 2005.doi:10.1109/DCC.2005.78. M. Lohrey 27 19 Pawel Gawrychowski, Florin Manea, and Markus L. Schmid. Revisiting weighted information extraction: A simpler and faster algorithm for ranked enumeration.Proceeedings of the ACM on Managment of Data, 2(5):222:1–222:19, 2024.doi:10.1145/3695840. 20 J. E. Hopcroft and J. D. Ullman.Introduc...
-
[7]
Enumeration of monadic second-order queries on trees
21 Wojciech Kazana and Luc Segoufin. Enumeration of monadic second-order queries on trees. ACM Transactions on Computational Logic, 14(4):25:1–25:12, 2013.doi:10.1145/2528928. 22 Sarah Kleest-Meißner, Jonas Marasus, and Matthias Niewerth. MSO queries on trees: Enu- merating answers under updates using forest algebras.Logical Methods in Computer Science, 2...
-
[8]
27 Markus Lohrey, Sebastian Maneth, and Markus L
doi:10.1007/s00453-017-0331-3. 27 Markus Lohrey, Sebastian Maneth, and Markus L. Schmid. FO-query enumeration over SLP- compressed structures of bounded degree. InProceedings of the 50th International Symposium on Mathematical Foundations of Computer Science, MFCS 2025, LIPIcs, pages 69:1–69:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2025.doi:...
-
[9]
29 Martin Muñoz and Cristian Riveros
URL: https://doi.org/10.46298/theoretics.26.6, doi: 10.46298/THEORETICS.26.6. 29 Martin Muñoz and Cristian Riveros. Constant-delay enumeration for SLP-compressed docu- ments.Logical Methods in Computer Science, 21(1),
-
[10]
doi:10.46298/LMCS-21(1:17)2025. 30 Martín Muñoz. Dynamic direct (ranked) access of MSO query evaluation over SLP-compressed strings,
-
[11]
URL:https://arxiv.org/abs/2603.13058,arXiv:2603.13058. 31 Matthias Niewerth. MSO queries on trees: Enumerating answers under updates using forest algebras. InProceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2018, pages 769–778. ACM, 2018.doi:10.1145/3209108.3209144. 32 Matthias Niewerth and Luc Segoufin. Enumeration of ...
-
[12]
33 Jacques Sakarovitch.Elements of Automata Theory
doi:10.1145/ 3196959.3196961. 33 Jacques Sakarovitch.Elements of Automata Theory. Cambridge University Press,
-
[13]
doi:10.1017/CBO9781139195218. 34 Markus L. Schmid and Nicole Schweikardt. Spanner evaluation over SLP-compressed docu- ments. InProceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2021, pages 153–165. ACM, 2021.doi:10.1145/3452021.3458325. 35 Markus L. Schmid and Nicole Schweikardt. Document spanners - A brief...
-
[14]
doi:10.1145/3517804.3526069. 36 Markus L. Schmid and Nicole Schweikardt. Query evaluation over SLP-represented document databases with complex document editing. InProceedings of the 41st ACM SIGMOD-SIGACT- SIGAI Symposium on Principles of Database Systems, PODS 2022, pages 79–89. ACM,
-
[15]
28 Ranked MSO-enumeration over compressed words 37 Imre Simon
doi:10.1145/3517804.3524158. 28 Ranked MSO-enumeration over compressed words 37 Imre Simon. Factorization forests of finite height.Theoretical Computer Science, 72(1):65–94, 1990.doi:10.1016/0304-3975(90)90047-L. 38 Howard Straubing.Finite Automata, Formal Logic, and Circuit Complexity. Birkhäuser, Boston, Basel, Berlin, 1994.doi:10.1007/978-1-4612-0289-9...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.