Merging RLBWTs adaptively
Pith reviewed 2026-05-17 21:03 UTC · model grok-4.3
The pith
Two run-length compressed Burrows-Wheeler transforms can be merged into a run-length compressed extended Burrows-Wheeler transform in linear space and near-linear time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show how to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT) in O(r) space and O((r + L) log(m + n)) time, where m and n are the lengths of the uncompressed strings, r is the number of runs in the final eBWT and L is the sum of its irreducible LCP values.
What carries the argument
An adaptive merging procedure that interleaves runs from the two input RLBWTs while preserving run-length compression in the output eBWT, using irreducible LCP values to guide decisions.
If this is right
- Larger string collections can have their eBWTs assembled incrementally by repeated pairwise merges without full decompression at any step.
- Memory stays proportional to the compressed size of the final result rather than the total uncompressed length.
- Index construction time improves when the merged string contains long runs or low irreducible LCP sums.
- The structure supports online addition of new strings while keeping the representation compressed.
Where Pith is reading between the lines
- Repeated application of the merge could handle more than two inputs while retaining the same asymptotic bounds.
- The technique may extend to other compressed suffix structures that rely on run-length encoding of the BWT.
- Practical tests on genomic datasets would reveal whether the logarithmic factor remains negligible in real-world merges.
Load-bearing premise
The inputs are valid RLBWTs whose run information and irreducible LCP values can be accessed or computed efficiently during the merge process.
What would settle it
Implement the merge on two short strings with precomputed RLBWTs, then check whether the output eBWT and its run count exactly match the eBWT obtained by direct concatenation and transformation.
read the original abstract
We show how to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT) in $O (r)$ space and $O ((r + L) \log (m + n))$ time, where $m$ and $n$ are the lengths of the uncompressed strings, $r$ is the number of runs in the final eBWT and $L$ is the sum of its irreducible LCP values.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an adaptive algorithm to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT). The central result is a procedure that uses O(r) space and O((r + L) log(m + n)) time, where m and n are the uncompressed string lengths, r is the number of runs in the output eBWT, and L is the sum of its irreducible LCP values.
Significance. If the claimed bounds hold, the result is significant for compressed string processing pipelines, as it enables merging without materializing the full strings and charges costs to output-sensitive parameters r and L rather than input sizes. This aligns with practical needs in text indexing and bioinformatics where RLBWTs are used for repetitive data. The adaptive, output-dependent complexity is a strength when the derivation is parameter-free with respect to the input RLBWT sizes.
minor comments (2)
- The preliminaries section would benefit from a small worked example showing how irreducible LCP values are maintained during the merge to clarify the data structures used for the priority-queue operations.
- A brief discussion of how the algorithm handles edge cases (e.g., when one input RLBWT is empty or when runs cross the merge boundary) would improve readability without affecting the asymptotic claims.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our manuscript and for recommending minor revision. The referee's summary accurately reflects the central contribution: an adaptive algorithm to merge two RLBWTs into an eBWT using O(r) space and O((r + L) log(m + n)) time. We are pleased that the significance for compressed string processing pipelines is recognized. Below we respond to the report.
Circularity Check
No significant circularity in algorithmic construction
full rationale
The paper describes a direct algorithmic procedure for merging two RLBWTs while producing a run-length compressed eBWT. The O(r) space and O((r + L) log(m + n)) time bounds are expressed explicitly in terms of output properties (number of runs r and sum of irreducible LCP values L) rather than being fitted to or defined by the inputs. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear in the derivation; the procedure relies on standard priority-queue operations and maintenance of lexicographic order under the given definitions of RLBWT and eBWT. The result is a self-contained constructive algorithm.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard properties of the Burrows-Wheeler Transform and run-length encoding hold for the given input structures.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We show how to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT) in O(r) space and O((r + L) log(m + n)) time
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
L is the sum of its irreducible LCP values
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
CVIT 2016 23:14 Merging RLBWTs adaptively A Proof of Lemma 7 ▶Lemma 7.Letπbe a permutation on{1,...,n}and P={1}∪{i: 1<i≤n,π(i)̸=π(i−1) + 1}. Given the set{(i,π(i)) : i∈P}and an integerd≥2, we can build a set{(i,π(i)) : i∈P′}, where P⊆P′⊆{1,...,n}, ifaandbare consecutive elements in{π(i) :i∈P′}∪{n+ 1}then|[a,b)∩P′|<2d, |P′|≤d|P| d−1. This takesO(|P|log|P|)...
work page 2016
-
[2]
= 0and |Pj+1|≤|Pj|+ 1, we use at most|P| d−1 rounds and returnP′with |P′|≤|P|+|P| d−1= d|P| d−1. Since the(j + 1)st round takesO(log|Pj|) =O(log|P|)time for all j, over all the rounds we useO ( |P|log|P| d ) total time.◀ CVIT 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.