Efficient Mining of Low-Utility Sequential Patterns
Pith reviewed 2026-05-18 07:48 UTC · model grok-4.3
The pith
New algorithms mine low-utility sequential patterns faster and with less memory by extending candidates and pruning redundancies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By redefining sequence utility and employing the maximal non-mutually contained sequence set to prune candidates, the LUSPM_e algorithm discovers the complete set of low-utility sequential patterns more efficiently than baseline and shrinkage-based methods, using less runtime and memory.
What carries the argument
The maximal non-mutually contained sequence set together with extension-based generation and multiple pruning strategies that eliminate redundant subsequence checks while preserving completeness.
If this is right
- LUSPM_e requires less runtime and memory than LUSPM_s while still returning every valid low-utility pattern.
- The sequence-utility chain records utility values without repeated full database scans.
- Pruning via the maximal non-mutually contained set removes many redundant generation steps in both optimized algorithms.
- Both LUSPM_s and LUSPM_e scale to larger sequence collections than the exhaustive baseline.
Where Pith is reading between the lines
- The extension-plus-pruning design could be reused for other pattern tasks where low values of a measure are the objects of interest, such as rare-event detection.
- Parallelizing the extension operations across multiple processors would likely improve speed further on very long sequence collections.
- Domain-specific utility redefinitions for genomics or network logs could be plugged into the same chain structure with only local changes.
Load-bearing premise
The maximal non-mutually contained sequence set and associated pruning strategies correctly eliminate redundant candidates without missing any valid low-utility sequential patterns or violating the redefined utility measure.
What would settle it
On a small dataset where every possible subsequence can be exhaustively checked, LUSPM_e either reports a pattern whose utility is miscalculated or omits a pattern that meets the low-utility threshold.
Figures
read the original abstract
Discovering valuable insights from rich data is a crucial task for exploratory data analysis. Sequential pattern mining (SPM) has found widespread applications across various domains. In recent years, low-utility sequential pattern mining (LUSPM) has shown strong potential in applications such as intrusion detection and genomic sequence analysis. However, existing research in utility-based SPM focuses on high-utility sequential patterns, and the definitions and strategies used in high-utility SPM cannot be directly applied to LUSPM. Moreover, no algorithms have yet been developed specifically for mining low-utility sequential patterns. To address these problems, we formalize the LUSPM problem, redefine sequence utility, and introduce a compact data structure called the sequence-utility chain to efficiently record utility information. Furthermore, we propose three novel algorithm--LUSPM_b, LUSPM_s, and LUSPM_e--to discover the complete set of low-utility sequential patterns. LUSPM_b serves as an exhaustive baseline, while LUSPM_s and LUSPM_e build upon it, generating subsequences through shrinkage and extension operations, respectively. In addition, we introduce the maximal non-mutually contained sequence set and incorporate multiple pruning strategies, which significantly reduce redundant operations in both LUSPM_s and LUSPM_e. Finally, extensive experimental results demonstrate that both LUSPM_s and LUSPM_e substantially outperform LUSPM_b and exhibit excellent scalability. Notably, LUSPM_e achieves superior efficiency, requiring less runtime and memory consumption than LUSPM_s. Our code is available at https://github.com/Zhidong-Lin/LUSPM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the problem of low-utility sequential pattern mining (LUSPM), which has not been addressed by prior algorithms focused on high-utility patterns. It redefines sequence utility, presents a sequence-utility chain data structure, and develops three algorithms: an exhaustive baseline LUSPM_b, a shrinkage-based LUSPM_s, and an extension-based LUSPM_e. Pruning is achieved via the maximal non-mutually contained sequence set. Experiments indicate that LUSPM_s and LUSPM_e are more efficient than the baseline, with LUSPM_e requiring the least runtime and memory while returning the complete set of low-utility sequential patterns.
Significance. This research opens up a new direction in utility-based sequential pattern mining by targeting low-utility patterns, relevant for applications like intrusion detection and genomic analysis. The provision of open code supports reproducibility. If the pruning strategies are proven sound, the efficiency improvements could make LUSPM practical for large datasets.
major comments (2)
- [§3.4 (Maximal Non-Mutually Contained Sequence Set and Pruning)] §3.4 (Maximal Non-Mutually Contained Sequence Set and Pruning): The pruning strategies rely on the assumption that the redefined utility allows safe elimination of candidates using maximal non-mutually contained sequences. The manuscript should include a detailed proof or empirical verification that no low-utility pattern is discarded by these strategies, as this is central to both the completeness claim and the efficiency advantages reported for LUSPM_e.
- [§5 (Experiments)] §5 (Experiments): While outperformance is claimed, the section should specify the exact datasets, utility thresholds, number of runs for timing, and include statistical significance tests to substantiate the scalability and efficiency claims.
minor comments (2)
- [Abstract] Abstract: Consider adding one or two example datasets or typical utility threshold values to give readers a concrete sense of the experimental setup.
- [§2 (Related Work)] §2 (Related Work): Ensure all citations to high-utility SPM papers are up to date and include a brief comparison table if space allows.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We address each major comment below and will revise the manuscript to strengthen the presentation of our contributions.
read point-by-point responses
-
Referee: §3.4 (Maximal Non-Mutually Contained Sequence Set and Pruning): The pruning strategies rely on the assumption that the redefined utility allows safe elimination of candidates using maximal non-mutually contained sequences. The manuscript should include a detailed proof or empirical verification that no low-utility pattern is discarded by these strategies, as this is central to both the completeness claim and the efficiency advantages reported for LUSPM_e.
Authors: We agree that a formal proof is necessary to rigorously establish the soundness of the pruning strategies. In the revised manuscript, we will add a detailed proof in §3.4 showing that the maximal non-mutually contained sequence set combined with the redefined sequence utility preserves all low-utility patterns. The proof will be based on the anti-monotonicity of the utility measure with respect to sequence containment and the definition of the maximal set. We will also include a brief empirical verification using small synthetic datasets to illustrate that no valid patterns are lost. revision: yes
-
Referee: §5 (Experiments): While outperformance is claimed, the section should specify the exact datasets, utility thresholds, number of runs for timing, and include statistical significance tests to substantiate the scalability and efficiency claims.
Authors: We appreciate this suggestion for improving experimental transparency. In the revised §5, we will explicitly name all datasets (with their key characteristics), report the precise utility thresholds used in each experiment, state that timings are averaged over 5 independent runs, and add statistical significance tests (paired t-tests with p-values) to support the efficiency comparisons between LUSPM_b, LUSPM_s, and LUSPM_e. revision: yes
Circularity Check
Algorithmic construction with external validation; no derivation reduces to inputs by construction
full rationale
The paper formalizes LUSPM, redefines sequence utility, introduces the sequence-utility chain data structure, and proposes three algorithms (LUSPM_b baseline, LUSPM_s via shrinkage, LUSPM_e via extension) plus pruning over the maximal non-mutually contained sequence set. These steps are constructive definitions and algorithmic procedures whose completeness and efficiency claims are supported by experimental comparisons on runtime/memory rather than by any equation or parameter that is fitted to the target output and then renamed as a prediction. No self-citation chain, uniqueness theorem, or ansatz is invoked to force the central result; the work remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Data mining: an overview from a database perspective,
M.-S. Chen, J. Han, and P. S. Yu, “Data mining: an overview from a database perspective,”IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866–883, 2002
work page 2002
-
[2]
PrefixSpan: Mining sequential patterns efficiently by prefix- projected pattern growth,
J. Han, J. Pei, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “PrefixSpan: Mining sequential patterns efficiently by prefix- projected pattern growth,” inThe 17th International Conference on Data Engineering. IEEE, 2001, pp. 215–224
work page 2001
-
[3]
Mining frequent patterns without candidate generation: A frequent-pattern tree approach,
J. Han, J. Pei, Y . Yin, and R. Mao, “Mining frequent patterns without candidate generation: A frequent-pattern tree approach,”Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 53–87, 2004
work page 2004
-
[4]
Mining association rules between sets of items in large databases,
R. Agrawal, T. Imieli ´nski, and A. Swami, “Mining association rules between sets of items in large databases,” inThe 22nd ACM SIGMOD International Conference on Management of Data, 1993, pp. 207–216
work page 1993
-
[5]
Mining cross-level high utility itemsets in unstable and negative profit databases,
N. Tung, T. D. Nguyen, L. T. Nguyen, D.-L. Vu, P. Fournier-Viger, and B. V o, “Mining cross-level high utility itemsets in unstable and negative profit databases,”IEEE Transactions on Knowledge and Data Engineering, vol. 37, no. 9, pp. 5420–5435, 2025
work page 2025
-
[6]
Toward targeted mining of RFM patterns,
X. Chen, W. Gan, Z. Chen, J. Zhu, R. Cai, and P. S. Yu, “Toward targeted mining of RFM patterns,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 9, pp. 16 619–16 632, 2025
work page 2025
-
[7]
R. Agrawal and R. Srikant, “Mining sequential patterns,” inThe 11th International Conference on Data Engineering. IEEE, 1995, pp. 3–14
work page 1995
-
[8]
P. Qiu, Y . Gong, Y . Zhao, L. Cao, C. Zhang, and X. Dong, “An efficient method for modeling nonoccurring behaviors by negative sequential patterns with loose constraints,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 4, pp. 1864–1878, 2021
work page 2021
-
[9]
e-RNSP: An efficient method for mining repetition negative sequential patterns,
X. Dong, Y . Gong, and L. Cao, “e-RNSP: An efficient method for mining repetition negative sequential patterns,”IEEE Transactions on Cybernetics, vol. 50, no. 5, pp. 2084–2096, 2018
work page 2084
-
[10]
A survey of parallel sequential pattern mining,
W. Gan, J. C. Lin, P. Fournier-Viger, H. Chao, and P. S. Yu, “A survey of parallel sequential pattern mining,”ACM Transactions on Knowledge Discovery from Data, vol. 13, no. 3, pp. 1–34, 2019
work page 2019
-
[11]
Anomaly rule detection in sequence data,
W. Gan, L. Chen, S. Wan, J. Chen, and C.-M. Chen, “Anomaly rule detection in sequence data,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12 095–12 108, 2023
work page 2023
-
[12]
Targeted mining precise- positioning episode rules,
J. Zhu, X. Chen, W. Gan, Z. Chen, and P. S. Yu, “Targeted mining precise- positioning episode rules,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 9, no. 1, pp. 904–917, 2025
work page 2025
-
[13]
USpan: An efficient algorithm for mining high utility sequential patterns,
J. Yin, Z. Zheng, and L. Cao, “USpan: An efficient algorithm for mining high utility sequential patterns,” inThe 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 660– 668
work page 2012
-
[14]
ProUM: Projection-based utility mining on sequence data,
W. Gan, J. C. Lin, J. Zhang, H. Chao, H. Fujita, and P. S. Yu, “ProUM: Projection-based utility mining on sequence data,”Information Sciences, vol. 513, pp. 222–240, 2020
work page 2020
-
[15]
On incremental high utility sequential pattern mining,
J. Wang and J. Huang, “On incremental high utility sequential pattern mining,”ACM Transactions on Intelligent Systems and Technology, vol. 9, no. 5, pp. 1–26, 2018
work page 2018
-
[16]
Mining high utility mobile sequential patterns in mobile commerce environments,
B. Shie, H. Hsiao, V . S. Tseng, and P. S. Yu, “Mining high utility mobile sequential patterns in mobile commerce environments,” inThe 16th International Conference on Database Systems for Advanced Applications, 2011, pp. 224–238
work page 2011
-
[17]
Applying the maximum utility measure in high utility sequential pattern mining,
G. Lan, T. Hong, V . S. Tseng, and S. Wang, “Applying the maximum utility measure in high utility sequential pattern mining,”Expert Systems With Applications, vol. 41, no. 11, pp. 5071–5081, 2014
work page 2014
-
[18]
CRoM and HuspExt: Improving efficiency of high utility sequential pattern extraction,
O. K. Alkan and P. Karagoz, “CRoM and HuspExt: Improving efficiency of high utility sequential pattern extraction,”IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 10, pp. 2645–2657, 2015
work page 2015
-
[19]
HUSP-SP: Faster utility mining on sequence data,
C. Zhang, Y . Yang, Z. Du, W. Gan, and P. S. Yu, “HUSP-SP: Faster utility mining on sequence data,”ACM Transactions on Knowledge Discovery from Data, vol. 18, no. 1, pp. 1–21, 2023
work page 2023
-
[20]
EHUSM: Mining high utility sequences with a pessimistic utility model,
T. Truong, A. Tran, H. Duong, B. Le, and P. Fournier-Viger, “EHUSM: Mining high utility sequences with a pessimistic utility model,”Data Science and Pattern Recognition, vol. 4, no. 2, pp. 65–83, 2020
work page 2020
-
[21]
EHAUSM: An efficient algorithm for high average utility sequence mining,
T. Truong, H. Duong, B. Le, and P. Fournier-Viger, “EHAUSM: An efficient algorithm for high average utility sequence mining,”Information Sciences, vol. 515, pp. 302–323, 2020
work page 2020
-
[22]
Pattern mining: Current challenges and opportunities,
P. Fournier-Viger, W. Gan, Y . Wu, M. Nouioua, W. Song, T. Truong, and H. Duong, “Pattern mining: Current challenges and opportunities,” in The 27th International Conference on Database Systems for Advanced Applications, 2022, pp. 34–49
work page 2022
-
[23]
Mining sequential patterns: Generalizations and performance improvements,
R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance improvements,” inThe International Conference on Extending Database Technology, 1996, pp. 1–17
work page 1996
-
[24]
FreeSpan: Frequent pattern-projected sequential pattern mining,
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu, “FreeSpan: Frequent pattern-projected sequential pattern mining,” inThe 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, pp. 355–359
work page 2000
-
[25]
Sequential pattern mining using a bitmap representation,
J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential pattern mining using a bitmap representation,” inThe 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 429– 435
work page 2002
-
[26]
Fast vertical mining of sequential patterns using co-occurrence information,
P. Fournier-Viger, A. Gomariz, M. Campos, and R. Thomas, “Fast vertical mining of sequential patterns using co-occurrence information,” inThe 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2014, pp. 40–52
work page 2014
-
[27]
NetNCSP: Nonoverlapping closed sequential pattern mining,
Y . Wu, C. Zhu, Y . Li, L. Guo, and X. Wu, “NetNCSP: Nonoverlapping closed sequential pattern mining,”Knowledge-based systems, vol. 196, p. 105812, 2020
work page 2020
-
[28]
CloFAST: Closed sequential pattern mining using sparse and vertical id-lists,
F. Fumarola, P. F. Lanotte, M. Ceci, and D. Malerba, “CloFAST: Closed sequential pattern mining using sparse and vertical id-lists,”Knowledge and Information Systems, vol. 48, no. 2, pp. 429–463, 2016
work page 2016
-
[29]
NetNMSP: Nonoverlapping maximal sequential pattern mining,
Y . Li, S. Zhang, L. Guo, J. Liu, Y . Wu, and X. Wu, “NetNMSP: Nonoverlapping maximal sequential pattern mining,”Applied Intelligence, vol. 52, no. 9, pp. 9861–9884, 2022
work page 2022
-
[30]
Mining maximal sequential patterns without candidate maintenance,
P. Fournier-Viger, C. Wu, and V . S. Tseng, “Mining maximal sequential patterns without candidate maintenance,” inThe 9th International Conference on Advances Data Mining and Applications, 2013, pp. 169– 180
work page 2013
-
[31]
SkOPUS: Mining top- k sequential patterns under leverage,
F. Petitjean, T. Li, N. Tatti, and G. I. Webb, “SkOPUS: Mining top- k sequential patterns under leverage,”Data Mining and Knowledge Discovery, vol. 30, pp. 1086–1111, 2016
work page 2016
-
[32]
TKS: efficient mining of top-k sequential patterns,
P. Fournier-Viger, A. Gomariz, T. Gueniche, E. Mwamikazi, and R. Thomas, “TKS: efficient mining of top-k sequential patterns,” inThe 9th International Conference on Advanced Data Mining and Applications, 2013, pp. 109–120
work page 2013
-
[33]
Goal-oriented sequential pattern for network banking churn analysis,
D. Chiang, Y . Wang, S. Lee, and C. Lin, “Goal-oriented sequential pattern for network banking churn analysis,”Expert Systems With Applications, vol. 25, no. 3, pp. 293–302, 2003
work page 2003
-
[34]
Targeted mining of contiguous sequential patterns,
K. Hu, W. Gan, S. Huang, H. Peng, and P. Fournier-Viger, “Targeted mining of contiguous sequential patterns,”Information Sciences, vol. 653, p. 119791, 2024
work page 2024
-
[35]
A survey of utility-oriented pattern mining,
W. Gan, J. C.-W. Lin, P. Fournier-Viger, H.-C. Chao, V . S. Tseng, and P. S. Yu, “A survey of utility-oriented pattern mining,”IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp. 1306–1327, 2021
work page 2021
-
[36]
On efficiently mining high utility sequential patterns,
J. Wang, J. Huang, and Y . Chen, “On efficiently mining high utility sequential patterns,”Knowledge and Information Systems, vol. 49, pp. 597–627, 2016
work page 2016
-
[37]
Fast utility mining on sequence data,
W. Gan, J. C. Lin, J. Zhang, P. Fournier-Viger, H. Chao, and P. S. Yu, “Fast utility mining on sequence data,”IEEE Transactions on Cybernetics, vol. 51, no. 2, pp. 487–500, 2021
work page 2021
-
[38]
Efficiently mining top-k high utility sequential patterns,
J. Yin, Z. Zheng, L. Cao, Y . Song, and W. Wei, “Efficiently mining top-k high utility sequential patterns,” inThe 13th IEEE International Conference on Data Mining, 2013, pp. 1259–1264
work page 2013
-
[39]
LUIM: New low-utility itemset mining framework,
N. Alhusaini, S. Karmoshi, A. Hawbani, L. Jing, A. Alhusaini, and Y . Al-Sharabi, “LUIM: New low-utility itemset mining framework,”IEEE Access, vol. 7, pp. 100 535–100 551, 2019
work page 2019
-
[40]
Enabling knowledge discovery through low utility itemset mining,
X. Zhang, G. Chen, L. Song, and W. Gan, “Enabling knowledge discovery through low utility itemset mining,”Expert Systems With Applications, vol. 265, p. 125955, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.