OVT-MLCS: An Online Visual Tool for MLCS Mining from Long or Big Sequences

Bing Liu; Hui Li; Liyong Zhang; Tihua Duan; Yanni Li; Zhi Wang

arxiv: 2604.13037 · v1 · submitted 2026-01-09 · 💻 cs.DB · cs.AI

OVT-MLCS: An Online Visual Tool for MLCS Mining from Long or Big Sequences

Zhi Wang , Yanni Li , Tihua Duan , Bing Liu , Liyong Zhang , Hui Li This is my paper

Pith reviewed 2026-05-16 16:12 UTC · model grok-4.3

classification 💻 cs.DB cs.AI

keywords MLCS mininglongest common subsequencessequence analysisonline visualizationKP-MLCS algorithmbig sequencesdata mining tool

0 comments

The pith

OVT-MLCS provides an online tool that mines multiple longest common subsequences from sequences up to length 5000 using a new key-point algorithm.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OVT-MLCS, an online visual tool for mining multiple longest common subsequences from sets of sequences over a finite alphabet. It first proposes the KP-MLCS algorithm to handle big sequences, then adds compact representation of all mined results and real-time graphic visualization with serialization. The tool supports mining, storing, downloading, and interactive inspection of MLCSs in graph and text forms for sequences ranging from length 3 to 5000. This fills a gap where no prior exact MLCS tool could process long or big sequences.

Core claim

The authors develop OVT-MLCS as a complete system that performs effective online mining, storing, and downloading of MLCSs from long or big sequences of scale 3 to 5000. The system rests on the KP-MLCS algorithm for mining big sequences, a compact representation method that reveals common patterns, and new techniques for real-time graphic visualization and serialization. It supplies user-friendly interactive functions for inspection and analysis, thereby enabling wider practical use of MLCS results.

What carries the argument

KP-MLCS, a key point-based algorithm for exact MLCS mining on big sequences, together with compact representation of all results and real-time graphic visualization plus serialization techniques.

If this is right

Users can mine and visualize MLCSs from sequences up to length 5000 directly in a browser without custom coding.
Compact representation lets analysts quickly spot shared patterns across all mined MLCSs.
Interactive graph and text views support inspection that was previously unavailable for long sequences.
The tool's storage and download features make MLCS results portable for downstream applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same visualization layer could be adapted to display MLCS results from other sequence-mining tools.
If KP-MLCS proves exact at scale, it could serve as a reference implementation for testing faster heuristics on long sequences.
Real-world datasets from bioinformatics or text corpora of length 5000 would provide natural test cases for the claimed performance.

Load-bearing premise

The KP-MLCS algorithm actually scales to exact MLCS computation on sequences of length 5000.

What would settle it

Run OVT-MLCS on a set of sequences of length 5000 whose true MLCS set is known in advance and check whether it returns the complete correct set within reasonable time and memory limits.

Figures

Figures reproduced from arXiv: 2604.13037 by Bing Liu, Hui Li, Liyong Zhang, Tihua Duan, Yanni Li, Zhi Wang.

**Figure 1.** Figure 1: The architecture of OVT-MLCS, where the compo [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 3.** Figure 3: OVT-MLCS mined MLCS results 𝐷𝐴𝐺𝐾𝑃 diagram. Display of Mined MLCS Results. To overcome the challenging issue faced by existing MLCS algorithms/tools (see Section 1, paragraph 3, issue 2), OVT-MLCS compresses and displays all mined MLCSs at once in the form of a 𝐷𝐴𝐺𝐾𝑃 graph, in which each path represents an MLCS solution. Moreover, by introducing Antv-X6 open source graphics engine and SVG (Scalable Vector … view at source ↗

**Figure 4.** Figure 4: OVT-MLCS Mining Results Insight (a) [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: OVT-MLCS Mining Results Insight (b). 2.3 Mining Result Insight To overcome the challenges of existing MLCS algorithms/ tools and to provide mining results insight to users, based on "Display of Mined MLCS Results"(see Section 2.2), OVT-MLCS further presents the following novel ways for displaying and interacting with mining results in a user-friendly manner. That is, 1) OVTMLCS presents some visual line c… view at source ↗

read the original abstract

Mining multiple longest common subsequences (\textit{MLCS}) from a set of sequences of three or more over a finite alphabet $\Sigma$ (a classical NP-hard problem) is an important task in a wide variety of application fields. Unfortunately, there is still no exact \textit{MLCS} algorithm/tool that can handle long (length $\ge$ 1,000) or big (length $\ge$ 10,000) sequences, which seriously hinders the development and utilization of massive long or big sequences from various application fields today. To address the challenge, we first propose a novel key point-based \textit{MLCS} algorithm for mining big sequences, called \textit{KP-MLCS}, and then present a new method, which can compactly represent all mined \textit{MLCSs} and quickly reveal common patterns among them. Furthermore, by introducing some new techniques, e.g., real-time graphic visualization and serialization, we have developed a new online visual \textit{MLCS} mining tool, called OVT-MLCS. OVT-MLCS demonstrates that it not only enables effective online mining, storing, and downloading of \textit{MLCSs} in the form of graphs and text from long or big sequences with a scale of 3 to 5000 but also provides user-friendly interactive functions to facilitate inspection and analysis of the mined \textit{MLCS}s. We believe that the functions provided by OVT-MLCS will promote stronger and wider applications of \textit{MLCS}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OVT-MLCS claims to mine exact MLCS from sequences up to length 5000 but supplies no experiments or analysis to back the scalability.

read the letter

The paper introduces KP-MLCS, a key-point based algorithm for exact multiple longest common subsequence mining, and wraps it in OVT-MLCS, an online visual tool with real-time graphics, compact representation of results, and download options. The stated goal is to handle sequences from length 3 to 5000 where prior exact methods fall short. They also add interactive inspection features so users can examine common patterns across the mined subsequences. That combination of algorithmic tweak plus practical interface is the main concrete step forward. It directly targets a known pain point in bioinformatics and string data mining where long sequences appear regularly. The description of the visualization and serialization steps is clear enough that a practitioner could see how the tool might fit into an existing workflow. The central problem is that none of the performance claims are supported. The abstract asserts that KP-MLCS enables effective online mining at length 5000, yet there are no runtimes, memory figures, test instances, or asymptotic bounds. MLCS is NP-hard, so any exact method must either exploit special input structure or pay exponential cost in the worst case. Without data showing which case applies here, the word effective remains an untested assertion. The full manuscript may contain the missing experiments, but nothing in the provided text demonstrates that the approach actually scales. This is the sort of work that would interest groups already working on sequence analysis who need a usable system more than a new theoretical bound. A reader looking for a ready tool description could extract some value from the interface details. It is worth sending to peer review so referees can check whether the experiments, once shown, actually support the length-5000 claim or whether the contribution stays at the level of a software prototype.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes KP-MLCS, a key-point-based exact algorithm for mining multiple longest common subsequences (MLCS) from long or big sequences, a compact representation technique for the resulting MLCSs, and the OVT-MLCS online visual tool that supports real-time graphic visualization, interactive analysis, storage, and download of results for input sequences with lengths ranging from 3 to 5000.

Significance. If the scalability claims hold, the work would address a recognized practical gap in exact MLCS mining for large-scale sequence data, potentially enabling broader applications in bioinformatics, pattern discovery, and database analytics where existing exact methods are limited to shorter sequences.

major comments (2)

[Abstract] Abstract: The central claim that KP-MLCS and OVT-MLCS enable 'effective' exact mining, storing, and visualization for sequences of length up to 5000 is unsupported by any reported runtime measurements, memory usage figures, benchmark instances, or asymptotic complexity bounds, leaving the effectiveness qualifier unverified.
[KP-MLCS algorithm description] Algorithm description sections: No formal analysis or proof is provided establishing that the key-point reduction in KP-MLCS avoids exponential blow-up in the worst case for arbitrary sequences of length 5000, which is required to substantiate the exactness guarantee for an NP-hard problem at the claimed scale.

minor comments (1)

[Abstract] The abstract contains LaTeX markup artifacts (e.g., textit) that should be rendered or removed for readability in the final version.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, providing the strongest honest defense of the work while acknowledging where revisions are needed to strengthen the claims.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that KP-MLCS and OVT-MLCS enable 'effective' exact mining, storing, and visualization for sequences of length up to 5000 is unsupported by any reported runtime measurements, memory usage figures, benchmark instances, or asymptotic complexity bounds, leaving the effectiveness qualifier unverified.

Authors: We agree that the abstract's use of 'effective' requires empirical support to be fully substantiated. The manuscript currently relies on the demonstration of functionality up to length 5000 without including explicit runtime, memory, or benchmark data. In the revised manuscript we will add a new experimental evaluation section reporting runtime measurements, peak memory usage, the specific benchmark sequence sets employed, and derived complexity observations for sequences of length 3 to 5000. This will directly verify the effectiveness claim. revision: yes
Referee: [KP-MLCS algorithm description] Algorithm description sections: No formal analysis or proof is provided establishing that the key-point reduction in KP-MLCS avoids exponential blow-up in the worst case for arbitrary sequences of length 5000, which is required to substantiate the exactness guarantee for an NP-hard problem at the claimed scale.

Authors: The KP-MLCS algorithm identifies key points as the minimal set of positions where character matches can contribute to an LCS, then restricts the dynamic-programming state space to only those positions and their immediate predecessors. This pruning is exact (no valid LCS is lost) and empirically reduces the explored states far below the full O(n^k) table for k sequences. We will revise the algorithm sections to include a clearer step-by-step description of the key-point construction, a worked example on a length-5000 instance, and a discussion of observed versus theoretical complexity. However, because MLCS remains NP-hard, a general proof that the reduction eliminates exponential blow-up for every possible arbitrary sequence of length 5000 is not available and cannot be supplied without additional restrictive assumptions on the input. revision: partial

standing simulated objections not resolved

A formal proof that the key-point reduction in KP-MLCS guarantees avoidance of exponential blow-up in the worst case for arbitrary sequences of length 5000

Circularity Check

0 steps flagged

No circularity: tool and algorithm claims are constructive, not self-referential derivations

full rationale

The paper introduces KP-MLCS as a novel algorithm and OVT-MLCS as a visualization tool for MLCS mining. The provided abstract and description contain no equations, fitted parameters, self-citations used as load-bearing premises, or ansatzes that reduce the central claims to their own inputs by construction. The scalability assertion for sequences up to length 5000 is presented as an empirical outcome of the implemented system rather than a mathematical derivation that loops back on itself. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on the standard statement that MLCS is NP-hard and introduces new algorithmic techniques without additional free parameters or invented entities.

axioms (1)

standard math MLCS is a classical NP-hard problem
Explicitly stated in the abstract as background for the challenge.

pith-pipeline@v0.9.0 · 5598 in / 1177 out tokens · 49694 ms · 2026-05-16T16:12:48.581624+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Alexander M Aravanis, Mark Lee, and Richard D Klausner. 2017. Next-generation sequencing of circulating tumor DNA for early cancer detection.Cell168, 4 (2017), 571–574

work page 2017
[2]

Yanni Li et al. 2016. A novel fast and memory efficient parallel MLCS algorithm for long and large-scale sequences alignments. InProceedings of the IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 1170–1181

work page 2016
[3]

Yanni Li, Hui Li, Tihua Duan, Sheng Wang, Zhi Wang, and Yang Cheng. 2016. A real linear and parallel multiple longest common subsequences (MLCS) algorithm. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1725–1734

work page 2016
[4]

Yanni Li, Bing Liu, Tihua Duan, Zhi Wang, Hui Li, and Jiangtao Cui. 2025. A Novel Key Point based MLCS Algorithm for Big Sequences Mining.IEEE Transactions on Knowledge and Data Engineering37, 1 (2025), 15–28

work page 2025
[5]

Yanni Li, Bing Liu, Zhi Wang, Jiangtao Cui, et al. 2020. COVID-19 Evolves in Human Hosts. InProceedings of the 26st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1–10

work page 2020
[6]

David Maier. 1978. The complexity of some problems on subsequences and supersequences.Journal of the ACM (JACM)25, 2 (1978), 322–336

work page 1978
[7]

Scott McGinnis and Thomas L Madden. 2004. BLAST: at the core of a powerful and diverse set of sequence analysis tools.Nucleic acids research32, suppl_2 (2004), W20–W25

work page 2004
[8]

Bianca Nogrady et al. 2020. How cancer genomics is transforming diagnosis and treatment.Nature579, 7800 (2020), S10

work page 2020
[9]

Sergey Nurk, Sergey Koren, Arang Rhie, et al. 2022. The complete sequence of a human genome.Science376, 6588 (2022), 44–53

work page 2022
[10]

Fabian Sievers, Andreas Wilm, David Dineen, et al. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology7, 1 (2011), 539

work page 2011
[11]

Shiwei Wei, Yuping Wang, et al. 2023. A Branch Elimination-based Efficient Algorithm for Large-scale Multiple Longest Common Subsequence Problem. IEEE Transactions on Knowledge and Data Engineering35, 03 (2023), 2179–2192. 4

work page 2023

[1] [1]

Alexander M Aravanis, Mark Lee, and Richard D Klausner. 2017. Next-generation sequencing of circulating tumor DNA for early cancer detection.Cell168, 4 (2017), 571–574

work page 2017

[2] [2]

Yanni Li et al. 2016. A novel fast and memory efficient parallel MLCS algorithm for long and large-scale sequences alignments. InProceedings of the IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 1170–1181

work page 2016

[3] [3]

Yanni Li, Hui Li, Tihua Duan, Sheng Wang, Zhi Wang, and Yang Cheng. 2016. A real linear and parallel multiple longest common subsequences (MLCS) algorithm. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1725–1734

work page 2016

[4] [4]

Yanni Li, Bing Liu, Tihua Duan, Zhi Wang, Hui Li, and Jiangtao Cui. 2025. A Novel Key Point based MLCS Algorithm for Big Sequences Mining.IEEE Transactions on Knowledge and Data Engineering37, 1 (2025), 15–28

work page 2025

[5] [5]

Yanni Li, Bing Liu, Zhi Wang, Jiangtao Cui, et al. 2020. COVID-19 Evolves in Human Hosts. InProceedings of the 26st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1–10

work page 2020

[6] [6]

David Maier. 1978. The complexity of some problems on subsequences and supersequences.Journal of the ACM (JACM)25, 2 (1978), 322–336

work page 1978

[7] [7]

Scott McGinnis and Thomas L Madden. 2004. BLAST: at the core of a powerful and diverse set of sequence analysis tools.Nucleic acids research32, suppl_2 (2004), W20–W25

work page 2004

[8] [8]

Bianca Nogrady et al. 2020. How cancer genomics is transforming diagnosis and treatment.Nature579, 7800 (2020), S10

work page 2020

[9] [9]

Sergey Nurk, Sergey Koren, Arang Rhie, et al. 2022. The complete sequence of a human genome.Science376, 6588 (2022), 44–53

work page 2022

[10] [10]

Fabian Sievers, Andreas Wilm, David Dineen, et al. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology7, 1 (2011), 539

work page 2011

[11] [11]

Shiwei Wei, Yuping Wang, et al. 2023. A Branch Elimination-based Efficient Algorithm for Large-scale Multiple Longest Common Subsequence Problem. IEEE Transactions on Knowledge and Data Engineering35, 03 (2023), 2179–2192. 4

work page 2023