OVT-MLCS: An Online Visual Tool for MLCS Mining from Long or Big Sequences
Pith reviewed 2026-05-16 16:12 UTC · model grok-4.3
The pith
OVT-MLCS provides an online tool that mines multiple longest common subsequences from sequences up to length 5000 using a new key-point algorithm.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors develop OVT-MLCS as a complete system that performs effective online mining, storing, and downloading of MLCSs from long or big sequences of scale 3 to 5000. The system rests on the KP-MLCS algorithm for mining big sequences, a compact representation method that reveals common patterns, and new techniques for real-time graphic visualization and serialization. It supplies user-friendly interactive functions for inspection and analysis, thereby enabling wider practical use of MLCS results.
What carries the argument
KP-MLCS, a key point-based algorithm for exact MLCS mining on big sequences, together with compact representation of all results and real-time graphic visualization plus serialization techniques.
If this is right
- Users can mine and visualize MLCSs from sequences up to length 5000 directly in a browser without custom coding.
- Compact representation lets analysts quickly spot shared patterns across all mined MLCSs.
- Interactive graph and text views support inspection that was previously unavailable for long sequences.
- The tool's storage and download features make MLCS results portable for downstream applications.
Where Pith is reading between the lines
- The same visualization layer could be adapted to display MLCS results from other sequence-mining tools.
- If KP-MLCS proves exact at scale, it could serve as a reference implementation for testing faster heuristics on long sequences.
- Real-world datasets from bioinformatics or text corpora of length 5000 would provide natural test cases for the claimed performance.
Load-bearing premise
The KP-MLCS algorithm actually scales to exact MLCS computation on sequences of length 5000.
What would settle it
Run OVT-MLCS on a set of sequences of length 5000 whose true MLCS set is known in advance and check whether it returns the complete correct set within reasonable time and memory limits.
Figures
read the original abstract
Mining multiple longest common subsequences (\textit{MLCS}) from a set of sequences of three or more over a finite alphabet $\Sigma$ (a classical NP-hard problem) is an important task in a wide variety of application fields. Unfortunately, there is still no exact \textit{MLCS} algorithm/tool that can handle long (length $\ge$ 1,000) or big (length $\ge$ 10,000) sequences, which seriously hinders the development and utilization of massive long or big sequences from various application fields today. To address the challenge, we first propose a novel key point-based \textit{MLCS} algorithm for mining big sequences, called \textit{KP-MLCS}, and then present a new method, which can compactly represent all mined \textit{MLCSs} and quickly reveal common patterns among them. Furthermore, by introducing some new techniques, e.g., real-time graphic visualization and serialization, we have developed a new online visual \textit{MLCS} mining tool, called OVT-MLCS. OVT-MLCS demonstrates that it not only enables effective online mining, storing, and downloading of \textit{MLCSs} in the form of graphs and text from long or big sequences with a scale of 3 to 5000 but also provides user-friendly interactive functions to facilitate inspection and analysis of the mined \textit{MLCS}s. We believe that the functions provided by OVT-MLCS will promote stronger and wider applications of \textit{MLCS}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes KP-MLCS, a key-point-based exact algorithm for mining multiple longest common subsequences (MLCS) from long or big sequences, a compact representation technique for the resulting MLCSs, and the OVT-MLCS online visual tool that supports real-time graphic visualization, interactive analysis, storage, and download of results for input sequences with lengths ranging from 3 to 5000.
Significance. If the scalability claims hold, the work would address a recognized practical gap in exact MLCS mining for large-scale sequence data, potentially enabling broader applications in bioinformatics, pattern discovery, and database analytics where existing exact methods are limited to shorter sequences.
major comments (2)
- [Abstract] Abstract: The central claim that KP-MLCS and OVT-MLCS enable 'effective' exact mining, storing, and visualization for sequences of length up to 5000 is unsupported by any reported runtime measurements, memory usage figures, benchmark instances, or asymptotic complexity bounds, leaving the effectiveness qualifier unverified.
- [KP-MLCS algorithm description] Algorithm description sections: No formal analysis or proof is provided establishing that the key-point reduction in KP-MLCS avoids exponential blow-up in the worst case for arbitrary sequences of length 5000, which is required to substantiate the exactness guarantee for an NP-hard problem at the claimed scale.
minor comments (1)
- [Abstract] The abstract contains LaTeX markup artifacts (e.g., textit) that should be rendered or removed for readability in the final version.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, providing the strongest honest defense of the work while acknowledging where revisions are needed to strengthen the claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that KP-MLCS and OVT-MLCS enable 'effective' exact mining, storing, and visualization for sequences of length up to 5000 is unsupported by any reported runtime measurements, memory usage figures, benchmark instances, or asymptotic complexity bounds, leaving the effectiveness qualifier unverified.
Authors: We agree that the abstract's use of 'effective' requires empirical support to be fully substantiated. The manuscript currently relies on the demonstration of functionality up to length 5000 without including explicit runtime, memory, or benchmark data. In the revised manuscript we will add a new experimental evaluation section reporting runtime measurements, peak memory usage, the specific benchmark sequence sets employed, and derived complexity observations for sequences of length 3 to 5000. This will directly verify the effectiveness claim. revision: yes
-
Referee: [KP-MLCS algorithm description] Algorithm description sections: No formal analysis or proof is provided establishing that the key-point reduction in KP-MLCS avoids exponential blow-up in the worst case for arbitrary sequences of length 5000, which is required to substantiate the exactness guarantee for an NP-hard problem at the claimed scale.
Authors: The KP-MLCS algorithm identifies key points as the minimal set of positions where character matches can contribute to an LCS, then restricts the dynamic-programming state space to only those positions and their immediate predecessors. This pruning is exact (no valid LCS is lost) and empirically reduces the explored states far below the full O(n^k) table for k sequences. We will revise the algorithm sections to include a clearer step-by-step description of the key-point construction, a worked example on a length-5000 instance, and a discussion of observed versus theoretical complexity. However, because MLCS remains NP-hard, a general proof that the reduction eliminates exponential blow-up for every possible arbitrary sequence of length 5000 is not available and cannot be supplied without additional restrictive assumptions on the input. revision: partial
- A formal proof that the key-point reduction in KP-MLCS guarantees avoidance of exponential blow-up in the worst case for arbitrary sequences of length 5000
Circularity Check
No circularity: tool and algorithm claims are constructive, not self-referential derivations
full rationale
The paper introduces KP-MLCS as a novel algorithm and OVT-MLCS as a visualization tool for MLCS mining. The provided abstract and description contain no equations, fitted parameters, self-citations used as load-bearing premises, or ansatzes that reduce the central claims to their own inputs by construction. The scalability assertion for sequences up to length 5000 is presented as an empirical outcome of the implemented system rather than a mathematical derivation that loops back on itself. No load-bearing step matches any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math MLCS is a classical NP-hard problem
Reference graph
Works this paper leans on
-
[1]
Alexander M Aravanis, Mark Lee, and Richard D Klausner. 2017. Next-generation sequencing of circulating tumor DNA for early cancer detection.Cell168, 4 (2017), 571–574
work page 2017
-
[2]
Yanni Li et al. 2016. A novel fast and memory efficient parallel MLCS algorithm for long and large-scale sequences alignments. InProceedings of the IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 1170–1181
work page 2016
-
[3]
Yanni Li, Hui Li, Tihua Duan, Sheng Wang, Zhi Wang, and Yang Cheng. 2016. A real linear and parallel multiple longest common subsequences (MLCS) algorithm. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1725–1734
work page 2016
-
[4]
Yanni Li, Bing Liu, Tihua Duan, Zhi Wang, Hui Li, and Jiangtao Cui. 2025. A Novel Key Point based MLCS Algorithm for Big Sequences Mining.IEEE Transactions on Knowledge and Data Engineering37, 1 (2025), 15–28
work page 2025
-
[5]
Yanni Li, Bing Liu, Zhi Wang, Jiangtao Cui, et al. 2020. COVID-19 Evolves in Human Hosts. InProceedings of the 26st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1–10
work page 2020
-
[6]
David Maier. 1978. The complexity of some problems on subsequences and supersequences.Journal of the ACM (JACM)25, 2 (1978), 322–336
work page 1978
-
[7]
Scott McGinnis and Thomas L Madden. 2004. BLAST: at the core of a powerful and diverse set of sequence analysis tools.Nucleic acids research32, suppl_2 (2004), W20–W25
work page 2004
-
[8]
Bianca Nogrady et al. 2020. How cancer genomics is transforming diagnosis and treatment.Nature579, 7800 (2020), S10
work page 2020
-
[9]
Sergey Nurk, Sergey Koren, Arang Rhie, et al. 2022. The complete sequence of a human genome.Science376, 6588 (2022), 44–53
work page 2022
-
[10]
Fabian Sievers, Andreas Wilm, David Dineen, et al. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology7, 1 (2011), 539
work page 2011
-
[11]
Shiwei Wei, Yuping Wang, et al. 2023. A Branch Elimination-based Efficient Algorithm for Large-scale Multiple Longest Common Subsequence Problem. IEEE Transactions on Knowledge and Data Engineering35, 03 (2023), 2179–2192. 4
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.