Dynamic Longest Common Substring in Polylogarithmic Time
Pith reviewed 2026-05-24 14:35 UTC · model grok-4.3
The pith
An algorithm maintains the longest common substring of two dynamic strings under edits in amortized O(log^7 n) time with high probability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents an algorithm that processes each edit operation on two dynamic strings in amortized O(log^7 n) time with high probability while correctly returning a longest common substring after every update. It achieves this by exploiting the local consistency property of a dynamic-string parsing and by maintaining two dynamic trees with labeled bicolored leaves. The same work establishes an Omega(log n / log log n) lower bound on the update time of any polynomial-size data structure for the fully dynamic case and of any O-tilde(n)-size data structure for the case of one static and one dynamic string; both lower bounds hold even when amortization and randomization are allowed.
What carries the argument
The local consistency property of the dynamic-string parsing from prior work, maintained inside two labeled dynamic trees with bicolored leaves.
If this is right
- Dynamic LCS can be maintained in real time for strings whose length reaches millions or billions.
- The lower bound rules out substantially faster solutions in the stated data-structure models even when amortization and randomization are permitted.
- The same parsing-and-tree technique supplies a template for other dynamic string problems that rely on consistent segmentations.
- Semi-dynamic LCS (one string fixed) inherits the same logarithmic lower bound when the data structure is required to stay near-linear in size.
Where Pith is reading between the lines
- The polylogarithmic upper bound and the Omega(log n / log log n) lower bound leave only a polylogarithmic gap, suggesting that further improvements would need either stronger models or new lower-bound techniques.
- Because the solution separates the parsing maintenance from the LCS extraction, the same framework could be reused for dynamic variants of other contiguous-substring problems such as longest repeated substring or longest palindrome.
- The reliance on bicolored leaf labels in dynamic trees indicates that similar color-based aggregation may help in maintaining other pairwise string statistics under edits.
Load-bearing premise
The local consistency property of the dynamic-string parsing continues to hold after each edit and can be maintained inside the two labeled dynamic trees.
What would settle it
A sequence of edits on which the maintained parsing loses local consistency or on which the reported LCS is incorrect, or a polynomial-size data structure that achieves o(log n / log log n) amortized update time for dynamic LCS.
Figures
read the original abstract
The longest common substring problem consists in finding a longest string that appears as a (contiguous) substring of two input strings. We consider the dynamic variant of this problem, in which we are to maintain two dynamic strings $S$ and $T$, each of length at most $n$, that undergo edit operations, i.e., substitutions, insertions, and deletions of letters, in order to be able to return a longest common substring after each edit. Amir, Charalampopoulos, Pissis, and Radoszewski [Algorithmica 2020] presented a solution for this problem that requires $\tilde{\mathcal{O}}(n^{2/3})$ time per update. This brought the challenge of determining whether there exists a solution with polylogarithmic update time or we should expect a polynomial (conditional) lower bound. We answer this question by designing an exponentially faster algorithm that processes each edit operation in amortized $O(\log^7 n)$ time with high probability. Our solution relies on exploiting the local consistency of the parsing of a collection of dynamic strings due to Gawrychowski, Karczmarz, Kociumaka, L\k{a}cki, and Sankowski [SODA 2018], and on maintaining two dynamic trees with labeled bicolored leaves. We complement our solution with a lower bound of $\Omega(\log n/ \log\log n)$ for the update time of any polynomial-size data structure that maintains an LCS of two dynamic strings, and the same lower bound for the update time of any $\tilde{\mathcal{O}}(n)$-size data structure that maintains the LCS of a static and a dynamic string. Both lower bounds hold even allowing amortization and randomization. This requires extending P\u{a}tra\c{s}cu's reduction from the lopsided set disjointness problem to the butterfly reachability problem [SICOMP 2011].
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a dynamic algorithm for the longest common substring (LCS) problem on two strings S and T of length at most n undergoing edits (insertions, deletions, substitutions). It maintains an LCS after each edit in amortized O(log^7 n) time with high probability by exploiting the local consistency property of the dynamic-string parsing from Gawrychowski et al. (SODA 2018) and maintaining two labeled bicolored dynamic trees. The work also gives an Ω(log n / log log n) lower bound on update time for any polynomial-size data structure maintaining LCS of two dynamic strings (and the same bound for Õ(n)-size structures maintaining LCS of one static and one dynamic string), obtained by extending Pătraşcu's reduction from lopsided set disjointness to butterfly reachability; both bounds hold even with amortization and randomization.
Significance. If the stated bounds and reductions hold, the result resolves the open question left by the prior Õ(n^{2/3}) solution, showing that polylogarithmic update time is achievable for dynamic LCS and that a logarithmic dependence is necessary in the stated models. The explicit construction for maintaining the parsing and the bicolored-tree operations, together with the independent lower-bound reduction, constitute a substantial technical advance in dynamic string algorithms.
minor comments (3)
- [Abstract] The precise failure probability (e.g., 1/n^c) for the high-probability bound should be stated explicitly when the O(log^7 n) claim is first introduced, rather than only in the abstract.
- [Section describing the data structures] Notation for the two dynamic trees (e.g., how leaves are labeled and colored, and which tree stores which parsing) should be introduced with a single consistent diagram or table early in the algorithmic section to aid readability.
- [Lower-bound section] The lower-bound section should include a short self-contained statement of the butterfly-reachability problem before describing the extension of Pătraşcu's reduction.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the paper, the accurate summary of our contributions, and the recommendation for minor revision. No major comments were raised in the report.
Circularity Check
No significant circularity; derivation self-contained
full rationale
The central algorithm maintains the local consistency property of the 2018 SODA parsing under edits via an explicit construction inside two labeled dynamic trees, then derives the amortized O(log^7 n) bound directly from the stated properties of those structures. The lower bound extends Pătraşcu's external reduction to butterfly reachability. No equation or claim reduces by construction to a fitted input, self-definition, or unverified self-citation chain; the 2018 result supplies an independent, externally published primitive whose constants are not tuned to the present target. The derivation therefore contains independent content and does not match any enumerated circularity pattern.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Locally consistent parsing of dynamic strings remains valid after each edit (Gawrychowski et al. SODA 2018)
- standard math Dynamic trees with labeled bicolored leaves support the required navigation and label queries in polylog time
Reference graph
Works this paper leans on
-
[1]
Paniz Abedin, Sahar Hooshmand, Arnab Ganguly, and Sharma V. Thankachan. The heaviest induced ancestors problem revisited. In 29th CPM, pages 20:1–20:13, 2018. doi: 10.4230/LIPIcs.CPM.2018.20
-
[2]
Pattern matching in dynamic texts
Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. Pattern matching in dynamic texts. In 11th SODA , pages 819–828, 2000. URL: http://dl.acm.org/citation.cfm? id=338219.338645
-
[3]
In 32nd European Conference on Object-Oriented Programming (ECOOP 2018)
Amihood Amir and Itai Boneh. Locally maximal common factors as a tool for efficient dynamic string algorithms. In 29th CPM, pages 11:1–11:13, 2018. doi:10.4230/LIPIcs. CPM.2018.11
-
[4]
Amihood Amir and Itai Boneh. Dynamic palindrome detection. CoRR, abs/1906.09732,
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[5]
Repetition Detection in a Dynamic String
Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky. Repetition Detection in a Dynamic String. In 27th ESA , pages 5:1–5:18, 2019. doi: 10.4230/LIPIcs.ESA.2019.5
-
[6]
Amihood Amir, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest common factor after one edit operation. In 24th SPIRE , pages 14–26, 2017. doi:10.1007/978-3-319-67428-5\_2
-
[7]
Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest common substring made fully dynamic. In 27th ESA, pages 6:1–6:17, 2019. doi: 10.4230/LIPIcs.ESA.2019.6. 24
-
[8]
Landau, Moshe Lewenstein, and Dina Sokol
Amihood Amir, Gad M. Landau, Moshe Lewenstein, and Dina Sokol. Dynamic text and static pattern matching. ACM Trans. Algorithms, 3(2):19, 2007. doi:10.1145/1240233. 1240242
-
[9]
Iliopoulos, Tomasz Kociu- maka, Solon P
Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociu- maka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Linear- time algorithm for long LCF with k mismatches. In 29th CPM , pages 23:1–23:16, 2018. doi:10.4230/LIPIcs.CPM.2018.23
-
[10]
Faster approx- imate pattern matching: A unified approach
Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approx- imate pattern matching: A unified approach. CoRR, abs/2004.08350, 2020. arXiv: 2004.08350
-
[11]
Dictionary matching and indexing with errors and don’t cares
Richard Cole, Lee-Ad Gottlieb, and Moshe Lewenstein. Dictionary matching and indexing with errors and don’t cares. In 36th STOC, pages 91–100, 2004. doi:10.1145/1007352. 1007374
-
[12]
Optimal suffix tree construction with large alphabets
Martin Farach. Optimal suffix tree construction with large alphabets. In 38th FOCS, pages 137–143, 1997. doi:10.1109/SFCS.1997.646102
-
[13]
Martin Farach and S. Muthukrishnan. Perfect hashing for strings: Formalization and algorithms. In 7th CPM, pages 130–140, 1996. doi:10.1007/3-540-61258-0\_11
- [14]
-
[15]
Heaviest induced ancestors and longest common substrings
Travis Gagie, Paweł Gawrychowski, and Yakov Nekrich. Heaviest induced ancestors and longest common substrings. In 25th CCCG, 2013. URL: http://cccg.ca/proceedings/ 2013/papers/paper_29.pdf
work page 2013
-
[16]
Paweł Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Lacki, and Piotr Sankowski. Optimal dynamic strings. In 29th SODA , pages 1509–1528, 2018. doi: 10.1137/1.9781611975031.99
-
[17]
Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranu- rak. Unifying and strengthening hardness for dynamic problems via the online matrix- vector multiplication conjecture. In 47th STOC , pages 21–30, 2015. doi:10.1145/ 2746539.2746609
-
[18]
Longest common extensions with recompression
Tomohiro I. Longest common extensions with recompression. In 28th CPM , pages 18:1– 18:15, 2017. doi:10.4230/LIPIcs.CPM.2017.18
-
[19]
Faster fully compressed pattern matching by recompression
Artur Jeż. Faster fully compressed pattern matching by recompression. ACM Transactions on Algorithms, 11(3):20:1–20:43, 2015. doi:10.1145/2631920
-
[20]
Recompression: A simple and powerful technique for word equations
Artur Jeż. Recompression: A simple and powerful technique for word equations. J. ACM, 63(1):4:1–4:51, 2016. doi:10.1145/2743014
-
[21]
Tomasz Kociumaka, Jakub Radoszewski, and Tatiana A. Starikovskaya. Longest common substring with approximately k mismatches. Algorithmica, 81(6):2633–2652, 2019. doi: 10.1007/s00453-019-00548-x
-
[22]
Starikovskaya, and Hjalte Wedel Vildhøj
Tomasz Kociumaka, Tatiana A. Starikovskaya, and Hjalte Wedel Vildhøj. Sublinear space algorithms for the longest common substring problem. In 22nd ESA, pages 605–617, 2014. doi:10.1007/978-3-662-44777-2_50 . 25
-
[23]
A data structure for multi-dimensional range reporting
Yakov Nekrich. A data structure for multi-dimensional range reporting. In 23rd SOCG, pages 344–353, 2007. doi:10.1145/1247069.1247130
-
[24]
Unifying the landscape of cell-probe lower bounds
Mihai Patrascu. Unifying the landscape of cell-probe lower bounds. SIAM J. Comput. , 40(3):827–847, 2011. doi:10.1137/09075336X
-
[25]
Mihai Pˇ atra¸ scu and Erik D. Demaine. Logarithmic lower bounds in the cell-probe model. SIAM J. Comput. , 35(4):932–963, 2006. doi:10.1137/S0097539705447256
-
[26]
Time-space trade-offs for predecessor search
Mihai Pˇ atra¸ scu and Mikkel Thorup. Time-space trade-offs for predecessor search. In 38th STOC, pages 232–240, 2006. doi:10.1145/1132516.1132551
-
[27]
S. Sahinalp and U. Vishkin. Efficient approximate and dynamic matching of patterns using a labeling paradigm. In 54th FOCS, pages 320–328, 1996. doi:10.1109/SFCS.1996. 548491
-
[28]
Starikovskaya and Hjalte Wedel Vildhøj
Tatiana A. Starikovskaya and Hjalte Wedel Vildhøj. Time-space trade-offs for the longest common substring problem. In 24th CPM , pages 223–234, 2013. doi:10.1007/ 978-3-642-38905-4_22
work page 2013
-
[29]
Thankachan, Alberto Apostolico, and Srinivas Aluru
Sharma V. Thankachan, Alberto Apostolico, and Srinivas Aluru. A provably efficient algo- rithm for the k-mismatch average common substring problem. Journal of Computational Biology, 23(6):472–482, 2016. doi:10.1089/cmb.2015.0235
-
[30]
Linear pattern matching algorithms
Peter Weiner. Linear pattern matching algorithms. In 14th FOCS , pages 1–11, 1973. doi:10.1109/SWAT.1973.13
-
[31]
Dan E. Willard. Log-logarithmic worst-case range queries are possible in space Θ(n). In- formation Processing Letters, 17(2):81 – 84, 1983. doi:10.1016/0020-0190(83)90075-3
-
[32]
Dan E. Willard and George S. Lueker. Adding range restriction capability to dynamic data structures. J. ACM, 32(3):597–617, 1985. doi:10.1145/3828.3839. 26
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.