pith. sign in

arxiv: 2006.02408 · v2 · submitted 2020-06-03 · 💻 cs.DS

Dynamic Longest Common Substring in Polylogarithmic Time

Pith reviewed 2026-05-24 14:35 UTC · model grok-4.3

classification 💻 cs.DS
keywords dynamic longest common substringedit operationspolylogarithmic timedata structureslocally consistent parsingdynamic treeslower boundsamortized analysis
0
0 comments X

The pith

An algorithm maintains the longest common substring of two dynamic strings under edits in amortized O(log^7 n) time with high probability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The dynamic longest common substring problem requires maintaining two strings of length at most n that receive insertions, deletions, and substitutions while reporting their LCS after every change. Earlier solutions needed roughly n to the two-thirds time per update. This work gives an exponentially faster solution whose updates cost amortized O(log^7 n) time with high probability. The approach rests on preserving a locally consistent parsing of the strings inside two dynamic trees whose leaves carry bicolored labels. The paper also proves that any polynomial-size data structure needs Omega(log n / log log n) update time even with amortization and randomization, and the same bound holds for near-linear-size structures when one string is static.

Core claim

The paper presents an algorithm that processes each edit operation on two dynamic strings in amortized O(log^7 n) time with high probability while correctly returning a longest common substring after every update. It achieves this by exploiting the local consistency property of a dynamic-string parsing and by maintaining two dynamic trees with labeled bicolored leaves. The same work establishes an Omega(log n / log log n) lower bound on the update time of any polynomial-size data structure for the fully dynamic case and of any O-tilde(n)-size data structure for the case of one static and one dynamic string; both lower bounds hold even when amortization and randomization are allowed.

What carries the argument

The local consistency property of the dynamic-string parsing from prior work, maintained inside two labeled dynamic trees with bicolored leaves.

If this is right

  • Dynamic LCS can be maintained in real time for strings whose length reaches millions or billions.
  • The lower bound rules out substantially faster solutions in the stated data-structure models even when amortization and randomization are permitted.
  • The same parsing-and-tree technique supplies a template for other dynamic string problems that rely on consistent segmentations.
  • Semi-dynamic LCS (one string fixed) inherits the same logarithmic lower bound when the data structure is required to stay near-linear in size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The polylogarithmic upper bound and the Omega(log n / log log n) lower bound leave only a polylogarithmic gap, suggesting that further improvements would need either stronger models or new lower-bound techniques.
  • Because the solution separates the parsing maintenance from the LCS extraction, the same framework could be reused for dynamic variants of other contiguous-substring problems such as longest repeated substring or longest palindrome.
  • The reliance on bicolored leaf labels in dynamic trees indicates that similar color-based aggregation may help in maintaining other pairwise string statistics under edits.

Load-bearing premise

The local consistency property of the dynamic-string parsing continues to hold after each edit and can be maintained inside the two labeled dynamic trees.

What would settle it

A sequence of edits on which the maintained parsing loses local consistency or on which the reported LCS is incorrect, or a polynomial-size data structure that achieves o(log n / log log n) amortized update time for dynamic LCS.

Figures

Figures reproduced from arXiv: 2006.02408 by Karol Pokorski, Panagiotis Charalampopoulos, Pawe{\l} Gawrychowski.

Figure 1
Figure 1. Figure 1: An example PT(T) for T = T0 = abababaabbcdabababcd. We omit the label of each node v with a single child u; L(v) = L(u). T3 = kefhkh and T6 = pq. We denote the nodes Cup(T) by red (filled) squares and the nodes of Cdown(T) with blue (unfilled) squares. dup(T) = abg2 `, ddown(T) = hg3 cd and hence d(T) = abg2 `hg3 cd. Let us consider any sequence of nodes corresponding, for some j < m, to A rj j with rj > 1… view at source ↗
Figure 2
Figure 2. Figure 2: Left: A 2D range tree. Right: Node representing regions [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
read the original abstract

The longest common substring problem consists in finding a longest string that appears as a (contiguous) substring of two input strings. We consider the dynamic variant of this problem, in which we are to maintain two dynamic strings $S$ and $T$, each of length at most $n$, that undergo edit operations, i.e., substitutions, insertions, and deletions of letters, in order to be able to return a longest common substring after each edit. Amir, Charalampopoulos, Pissis, and Radoszewski [Algorithmica 2020] presented a solution for this problem that requires $\tilde{\mathcal{O}}(n^{2/3})$ time per update. This brought the challenge of determining whether there exists a solution with polylogarithmic update time or we should expect a polynomial (conditional) lower bound. We answer this question by designing an exponentially faster algorithm that processes each edit operation in amortized $O(\log^7 n)$ time with high probability. Our solution relies on exploiting the local consistency of the parsing of a collection of dynamic strings due to Gawrychowski, Karczmarz, Kociumaka, L\k{a}cki, and Sankowski [SODA 2018], and on maintaining two dynamic trees with labeled bicolored leaves. We complement our solution with a lower bound of $\Omega(\log n/ \log\log n)$ for the update time of any polynomial-size data structure that maintains an LCS of two dynamic strings, and the same lower bound for the update time of any $\tilde{\mathcal{O}}(n)$-size data structure that maintains the LCS of a static and a dynamic string. Both lower bounds hold even allowing amortization and randomization. This requires extending P\u{a}tra\c{s}cu's reduction from the lopsided set disjointness problem to the butterfly reachability problem [SICOMP 2011].

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper presents a dynamic algorithm for the longest common substring (LCS) problem on two strings S and T of length at most n undergoing edits (insertions, deletions, substitutions). It maintains an LCS after each edit in amortized O(log^7 n) time with high probability by exploiting the local consistency property of the dynamic-string parsing from Gawrychowski et al. (SODA 2018) and maintaining two labeled bicolored dynamic trees. The work also gives an Ω(log n / log log n) lower bound on update time for any polynomial-size data structure maintaining LCS of two dynamic strings (and the same bound for Õ(n)-size structures maintaining LCS of one static and one dynamic string), obtained by extending Pătraşcu's reduction from lopsided set disjointness to butterfly reachability; both bounds hold even with amortization and randomization.

Significance. If the stated bounds and reductions hold, the result resolves the open question left by the prior Õ(n^{2/3}) solution, showing that polylogarithmic update time is achievable for dynamic LCS and that a logarithmic dependence is necessary in the stated models. The explicit construction for maintaining the parsing and the bicolored-tree operations, together with the independent lower-bound reduction, constitute a substantial technical advance in dynamic string algorithms.

minor comments (3)
  1. [Abstract] The precise failure probability (e.g., 1/n^c) for the high-probability bound should be stated explicitly when the O(log^7 n) claim is first introduced, rather than only in the abstract.
  2. [Section describing the data structures] Notation for the two dynamic trees (e.g., how leaves are labeled and colored, and which tree stores which parsing) should be introduced with a single consistent diagram or table early in the algorithmic section to aid readability.
  3. [Lower-bound section] The lower-bound section should include a short self-contained statement of the butterfly-reachability problem before describing the extension of Pătraşcu's reduction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of the paper, the accurate summary of our contributions, and the recommendation for minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The central algorithm maintains the local consistency property of the 2018 SODA parsing under edits via an explicit construction inside two labeled dynamic trees, then derives the amortized O(log^7 n) bound directly from the stated properties of those structures. The lower bound extends Pătraşcu's external reduction to butterfly reachability. No equation or claim reduces by construction to a fitted input, self-definition, or unverified self-citation chain; the 2018 result supplies an independent, externally published primitive whose constants are not tuned to the present target. The derivation therefore contains independent content and does not match any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

No free parameters or invented entities; the construction rests on standard dynamic-tree and string-parsing assumptions already present in the literature.

axioms (2)
  • domain assumption Locally consistent parsing of dynamic strings remains valid after each edit (Gawrychowski et al. SODA 2018)
    Invoked to obtain stable blocks that can be maintained in the bicolored trees.
  • standard math Dynamic trees with labeled bicolored leaves support the required navigation and label queries in polylog time
    Standard assumption on dynamic tree data structures used to locate matching blocks.

pith-pipeline@v0.9.0 · 5893 in / 1313 out tokens · 27348 ms · 2026-05-24T14:35:52.259177+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

  1. [1]

    Thankachan

    Paniz Abedin, Sahar Hooshmand, Arnab Ganguly, and Sharma V. Thankachan. The heaviest induced ancestors problem revisited. In 29th CPM, pages 20:1–20:13, 2018. doi: 10.4230/LIPIcs.CPM.2018.20

  2. [2]

    Pattern matching in dynamic texts

    Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. Pattern matching in dynamic texts. In 11th SODA , pages 819–828, 2000. URL: http://dl.acm.org/citation.cfm? id=338219.338645

  3. [3]

    In 32nd European Conference on Object-Oriented Programming (ECOOP 2018)

    Amihood Amir and Itai Boneh. Locally maximal common factors as a tool for efficient dynamic string algorithms. In 29th CPM, pages 11:1–11:13, 2018. doi:10.4230/LIPIcs. CPM.2018.11

  4. [4]

    Dynamic Palindrome Detection

    Amihood Amir and Itai Boneh. Dynamic palindrome detection. CoRR, abs/1906.09732,

  5. [5]

    Repetition Detection in a Dynamic String

    Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky. Repetition Detection in a Dynamic String. In 27th ESA , pages 5:1–5:18, 2019. doi: 10.4230/LIPIcs.ESA.2019.5

  6. [6]

    Iliopoulos, Solon P

    Amihood Amir, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest common factor after one edit operation. In 24th SPIRE , pages 14–26, 2017. doi:10.1007/978-3-319-67428-5\_2

  7. [7]

    Pissis, and Jakub Radoszewski

    Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest common substring made fully dynamic. In 27th ESA, pages 6:1–6:17, 2019. doi: 10.4230/LIPIcs.ESA.2019.6. 24

  8. [8]

    Landau, Moshe Lewenstein, and Dina Sokol

    Amihood Amir, Gad M. Landau, Moshe Lewenstein, and Dina Sokol. Dynamic text and static pattern matching. ACM Trans. Algorithms, 3(2):19, 2007. doi:10.1145/1240233. 1240242

  9. [9]

    Iliopoulos, Tomasz Kociu- maka, Solon P

    Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociu- maka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Linear- time algorithm for long LCF with k mismatches. In 29th CPM , pages 23:1–23:16, 2018. doi:10.4230/LIPIcs.CPM.2018.23

  10. [10]

    Faster approx- imate pattern matching: A unified approach

    Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approx- imate pattern matching: A unified approach. CoRR, abs/2004.08350, 2020. arXiv: 2004.08350

  11. [11]

    Dictionary matching and indexing with errors and don’t cares

    Richard Cole, Lee-Ad Gottlieb, and Moshe Lewenstein. Dictionary matching and indexing with errors and don’t cares. In 36th STOC, pages 91–100, 2004. doi:10.1145/1007352. 1007374

  12. [12]

    Optimal suffix tree construction with large alphabets

    Martin Farach. Optimal suffix tree construction with large alphabets. In 38th FOCS, pages 137–143, 1997. doi:10.1109/SFCS.1997.646102

  13. [13]

    Muthukrishnan

    Martin Farach and S. Muthukrishnan. Perfect hashing for strings: Formalization and algorithms. In 7th CPM, pages 130–140, 1996. doi:10.1007/3-540-61258-0\_11

  14. [14]

    Harold N. Gabow. Data structures for weighted matching and nearest common ancestors with linking. In 1st SODA, pages 434–443, 1990. URL: http://dl.acm.org/citation. cfm?id=320176.320229

  15. [15]

    Heaviest induced ancestors and longest common substrings

    Travis Gagie, Paweł Gawrychowski, and Yakov Nekrich. Heaviest induced ancestors and longest common substrings. In 25th CCCG, 2013. URL: http://cccg.ca/proceedings/ 2013/papers/paper_29.pdf

  16. [16]

    Optimal dynamic strings

    Paweł Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Lacki, and Piotr Sankowski. Optimal dynamic strings. In 29th SODA , pages 1509–1528, 2018. doi: 10.1137/1.9781611975031.99

  17. [17]

    Unifying and strengthening hardness for dynamic problems via the online matrix- vector multiplication conjecture

    Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranu- rak. Unifying and strengthening hardness for dynamic problems via the online matrix- vector multiplication conjecture. In 47th STOC , pages 21–30, 2015. doi:10.1145/ 2746539.2746609

  18. [18]

    Longest common extensions with recompression

    Tomohiro I. Longest common extensions with recompression. In 28th CPM , pages 18:1– 18:15, 2017. doi:10.4230/LIPIcs.CPM.2017.18

  19. [19]

    Faster fully compressed pattern matching by recompression

    Artur Jeż. Faster fully compressed pattern matching by recompression. ACM Transactions on Algorithms, 11(3):20:1–20:43, 2015. doi:10.1145/2631920

  20. [20]

    Recompression: A simple and powerful technique for word equations

    Artur Jeż. Recompression: A simple and powerful technique for word equations. J. ACM, 63(1):4:1–4:51, 2016. doi:10.1145/2743014

  21. [21]

    Starikovskaya

    Tomasz Kociumaka, Jakub Radoszewski, and Tatiana A. Starikovskaya. Longest common substring with approximately k mismatches. Algorithmica, 81(6):2633–2652, 2019. doi: 10.1007/s00453-019-00548-x

  22. [22]

    Starikovskaya, and Hjalte Wedel Vildhøj

    Tomasz Kociumaka, Tatiana A. Starikovskaya, and Hjalte Wedel Vildhøj. Sublinear space algorithms for the longest common substring problem. In 22nd ESA, pages 605–617, 2014. doi:10.1007/978-3-662-44777-2_50 . 25

  23. [23]

    A data structure for multi-dimensional range reporting

    Yakov Nekrich. A data structure for multi-dimensional range reporting. In 23rd SOCG, pages 344–353, 2007. doi:10.1145/1247069.1247130

  24. [24]

    Unifying the landscape of cell-probe lower bounds

    Mihai Patrascu. Unifying the landscape of cell-probe lower bounds. SIAM J. Comput. , 40(3):827–847, 2011. doi:10.1137/09075336X

  25. [25]

    Mihai Pˇ atra¸ scu and Erik D. Demaine. Logarithmic lower bounds in the cell-probe model. SIAM J. Comput. , 35(4):932–963, 2006. doi:10.1137/S0097539705447256

  26. [26]

    Time-space trade-offs for predecessor search

    Mihai Pˇ atra¸ scu and Mikkel Thorup. Time-space trade-offs for predecessor search. In 38th STOC, pages 232–240, 2006. doi:10.1145/1132516.1132551

  27. [27]

    Sahinalp and U

    S. Sahinalp and U. Vishkin. Efficient approximate and dynamic matching of patterns using a labeling paradigm. In 54th FOCS, pages 320–328, 1996. doi:10.1109/SFCS.1996. 548491

  28. [28]

    Starikovskaya and Hjalte Wedel Vildhøj

    Tatiana A. Starikovskaya and Hjalte Wedel Vildhøj. Time-space trade-offs for the longest common substring problem. In 24th CPM , pages 223–234, 2013. doi:10.1007/ 978-3-642-38905-4_22

  29. [29]

    Thankachan, Alberto Apostolico, and Srinivas Aluru

    Sharma V. Thankachan, Alberto Apostolico, and Srinivas Aluru. A provably efficient algo- rithm for the k-mismatch average common substring problem. Journal of Computational Biology, 23(6):472–482, 2016. doi:10.1089/cmb.2015.0235

  30. [30]

    Linear pattern matching algorithms

    Peter Weiner. Linear pattern matching algorithms. In 14th FOCS , pages 1–11, 1973. doi:10.1109/SWAT.1973.13

  31. [31]

    Dan E. Willard. Log-logarithmic worst-case range queries are possible in space Θ(n). In- formation Processing Letters, 17(2):81 – 84, 1983. doi:10.1016/0020-0190(83)90075-3

  32. [32]

    Willard and George S

    Dan E. Willard and George S. Lueker. Adding range restriction capability to dynamic data structures. J. ACM, 32(3):597–617, 1985. doi:10.1145/3828.3839. 26