Parallel Two-Stage Approach for Joint Symbolic Approximation of Time Series

Xinye Chen

arxiv: 2401.00109 · v3 · submitted 2023-12-30 · 💻 cs.DS · cs.DB· cs.DC· cs.SC

Parallel Two-Stage Approach for Joint Symbolic Approximation of Time Series

Xinye Chen This is my paper

Pith reviewed 2026-05-24 04:58 UTC · model grok-4.3

classification 💻 cs.DS cs.DBcs.DCcs.SC

keywords time seriessymbolic approximationparallel processingdigitizationscalabilitydata compressionABBA

0 comments

The pith

A parallel two-stage approach decouples local compression from global digitization for scalable joint symbolic approximation of time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method for symbolic approximation that works on many time series at once instead of handling them one after another. It partitions the series into independent parts that can be compressed in parallel and then uses a shared dictionary to assign symbols to all of them together. A two-stage process does local aggregation first and then merges globally to avoid moving all data at once. This is meant to keep the quality of the representation high while making it fast enough for large collections of data on modern computers. If it works, it means symbolic methods can be used as a practical tool for big temporal datasets without losing the shape-preserving benefits.

Core claim

The proposed method decouples local compression from global digitization by partitioning time series into independent domains for parallel compression and digitizing the resulting pieces using a shared global dictionary, with a two-stage parallel digitization scheme that performs aggregation locally before a global merge without full-data reassignment.

What carries the argument

Decoupling of local compression from global digitization with a shared global dictionary and two-stage parallel aggregation.

If this is right

Consistent symbols are produced across different time series.
Runtime is substantially reduced on multicore and distributed-memory systems.
Reconstruction quality remains competitive with sequential per-series methods.
The method can serve as an efficient high-level parallel tool for large-scale temporal data analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may allow pattern discovery across multiple series that independent processing cannot achieve.
It could be adapted to other symbolic representation techniques to improve their scalability.
In practice, the local aggregation step might lower communication costs in distributed computing environments.

Load-bearing premise

That partitioning time series into independent domains for parallel compression and using a shared global dictionary will preserve the shape-preserving and consistency properties without introducing significant quality loss.

What would settle it

Running the method on the paper's large synthetic benchmarks and finding that reconstruction quality falls substantially below the sequential ABBA baseline or that runtime savings are negligible.

Figures

Figures reproduced from arXiv: 2401.00109 by Xinye Chen.

**Figure 1.** Figure 1: 2-dimensional data partition using vector quantiza [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 3.** Figure 3: shows the simulation of k-means++ and sampling-based k-means (sampling 𝑟 = 10% of data) on Gaussian blobs data with 10 clusters, proceeding by increasing data sizes. The result is as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Value of 𝛼 as tol increases. 5 JOINT SYMBOLIC APPROXIMATION After discussing the two ABBA methods, we introduce a joint symbolic aggregate approximation on how to perform fast ABBA symbolization on multiple time series while retaining the symbolic consistency. This joint approximation framework is also applicable to large-scale univariate time series and multivariate time series. The ideal case of symbol… view at source ↗

**Figure 5.** Figure 5: Fork and join model: since there is no dependency [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Reconstruction of symbolic representation for [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: MSE of JABBA with varying number of partitions [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Runtime of JABBA with varying number of parti [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Speedup. introduce a joint symbolic approximation method that improves the speed of ABBA symbolization and achieves symbolic consistency in each representation. The framework of joint symbolic approximation enables parallel computing for further speedup. Attributed to the symbolic consistency, a manipulation of natural language processing and text mining techniques is available in time series. The conver… view at source ↗

read the original abstract

As time-series applications grow larger, there is increasing demand for symbolic representations that are compact, accurate, and scalable across many signals and computing resources. Current ABBA-based symbolic approximation methods produce high-quality, shape-preserving representations, but they handle each time series separately and sequentially. This means they do not ensure consistent symbols across different series and cannot fully exploit modern multicore systems and distributed-memory systems. This paper presents a joint symbolic time-series approximation method for large-scale time series. The proposed method decouples local compression from global digitization: (i) time series are partitioned into independent domains that can be compressed in parallel, and (ii) the resulting pieces are digitized using a shared global dictionary. To further improve scalability, we introduce a two-stage parallel digitization scheme, in which aggregation is first performed locally and then merged globally without requiring a full-data reassignment step. Extensive experiments on time-series datasets and large synthetic benchmarks show that our approach maintains competitive reconstruction quality while substantially reducing runtime. These results show that joint symbolic approximation can serve as an efficient, high-level parallel tool for analyzing large-scale temporal data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a parallel two-stage ABBA scheme that splits local compression from global merge to cut runtime on large collections, but the merge step's effect on digitization quality is not tightly bounded.

read the letter

The main point is that this paper describes a parallel two-stage approach for joint symbolic approximation of multiple time series using an ABBA-like method. It partitions the series into independent domains for local compression in parallel, then uses a two-stage digitization where local aggregation is merged globally to build a shared dictionary without needing to reassign all data points. This is new relative to the sequential per-series methods in prior ABBA work, which didn't handle consistency across series or exploit parallel computing resources. What the paper does well is laying out a clear decoupling of local and global steps to address scalability for large collections of time series. The two-stage scheme is a practical way to reduce the communication or computation overhead in a distributed setting. The abstract reports that experiments on real datasets and large synthetic benchmarks show the method keeps reconstruction quality competitive while cutting runtime substantially, which is the kind of result that matters for practical use. The soft spots are in the details of the merge step. The concern is that operating only on aggregated local statistics for the global merge could lead to different symbol assignments or boundaries compared to a full global pass, especially if the local partitions have uneven distributions. Without an analytic bound showing the error stays small, or without experiments that directly compare to the sequential version on the same data with varying partition schemes, it's not clear how much the shape-preserving property holds up. The abstract mentions extensive experiments but doesn't specify error bars, exclusion rules, or how quality was measured across different partitionings. This work is for researchers in data science, databases, or time series analysis who need to handle many signals at once and want a scalable symbolic tool. Someone working on parallel algorithms or approximation techniques would get the most out of the scheme and the reported speedups. It deserves a serious referee because the idea targets a genuine limitation in existing methods and provides a concrete algorithmic fix, even though the evaluation of the quality trade-off could be tighter. I would recommend sending this to peer review rather than desk rejecting it.

Referee Report

1 major / 0 minor

Summary. The paper proposes a parallel two-stage joint symbolic approximation method for time series based on ABBA. It decouples local compression (partitioning series into independent domains for parallel processing) from global digitization (using a shared dictionary via local aggregation followed by global merge without full reassignment), claiming that this yields consistent symbols across series, competitive reconstruction quality, and substantially reduced runtime compared to sequential per-series ABBA, as supported by experiments on real datasets and large synthetic benchmarks.

Significance. If the quality preservation claim holds under the two-stage merge, the method would enable scalable, consistent symbolic representations for large-scale time-series analysis on multicore and distributed systems. The extensive experiments on synthetic benchmarks provide a concrete strength for assessing practical performance.

major comments (1)

[§3.2] §3.2: The global merge step operates solely on aggregated local statistics without a full-data reassignment pass. When partition distributions differ, this can shift centroids or symbol boundaries, which directly risks violating the shape-preserving property guaranteed by sequential ABBA per series. No analytic bound is derived showing that the resulting merge error remains controlled by the local aggregation error; this is load-bearing for the central claim that quality remains competitive.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We address the major comment regarding the global merge step in detail below.

read point-by-point responses

Referee: [§3.2] §3.2: The global merge step operates solely on aggregated local statistics without a full-data reassignment pass. When partition distributions differ, this can shift centroids or symbol boundaries, which directly risks violating the shape-preserving property guaranteed by sequential ABBA per series. No analytic bound is derived showing that the resulting merge error remains controlled by the local aggregation error; this is load-bearing for the central claim that quality remains competitive.

Authors: We acknowledge that the global merge relies on aggregated local statistics and that no analytic bound on the approximation error is provided in the manuscript. The two-stage approach is motivated by the need for scalability in large-scale settings where a full reassignment would be prohibitive. The aggregation step collects sufficient statistics to construct a global dictionary that approximates what a full pass would yield. While differing partition distributions could in principle affect centroids, our design uses a merge that combines the local dictionaries in a way that preserves the overall distribution. The central claim of competitive quality is supported by extensive experiments on real datasets and large synthetic benchmarks, where reconstruction errors are shown to be comparable to sequential ABBA even under varied partitioning. We agree that a theoretical bound would be valuable but note that deriving tight bounds for such merge operations in symbolic approximation is non-trivial; the empirical validation serves as the primary evidence for practicality. revision: partial

Circularity Check

0 steps flagged

No circularity: algorithmic proposal validated by external experiments

full rationale

The paper introduces a parallel two-stage method that partitions time series for local compression followed by global dictionary digitization via aggregation and merge. Claims rest on runtime reduction and competitive reconstruction quality shown via experiments on real and synthetic datasets. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text that would reduce the central claims to inputs by construction. The approach is presented as an engineering decoupling with empirical checks, not a self-referential derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no specific details on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5722 in / 1029 out tokens · 32942 ms · 2026-05-24T04:58:38.880409+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

joint symbolic aggregate approximation ... partitional compression ... shared global dictionary ... two-stage parallel digitization ... auto digitization via Brownian bridge (Eq. 21)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

JABBA (GA) ... greedy aggregation ... sampling-based k-means

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Ragha- van. 1998. Automatic subspace clustering of high dimensional data for data mining applications. Proceedings of the ACM SIGMOD International Conference on Management of Data 27 (1998), 94–105

work page 1998
[2]

2007.k-means++: the advantages of careful seeding

David Arthur and Sergei Vassilvitskii. 2007.k-means++: the advantages of careful seeding. In SODA ’07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 1027– 1035

work page 2007
[3]

Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J

Anthony J. Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J. Keogh. 2018. The UEA multivariate time series classification archive. CoRR (2018)

work page 2018
[4]

Konstantinos Bountrogiannis, George Tzagkarakis, and Panagiotis Tsakalides

work page
[5]

IEEE Transactions on Knowledge and Data Engineering 35, 6 (2023), 5752–5766

Distribution Agnostic Symbolic Representations for Time Series Dimension- ality Reduction and Online Anomaly Detection. IEEE Transactions on Knowledge and Data Engineering 35, 6 (2023), 5752–5766

work page 2023
[6]

Ricardo J. G. B. Campello, Davoud Moulavi, and Joerg Sander. 2013. Density- Based Clustering Based on Hierarchical Density Estimates. InAdvances in Knowl- edge Discovery and Data Mining . Springer, 160–172

work page 2013
[7]

Xinye Chen and Stefan Güttel. 2022. An Efficient Aggregation Method for the Symbolic Representation of Temporal Data. ACM Transactions on Knowledge Discovery from Data (2022)

work page 2022
[8]

Xinye Chen and Stefan Güttel. 2022. Fast and explainable clustering based on sorting. (2022), 25. arXiv:2202.01456

work page arXiv 2022
[9]

Sanjoy Dasgupta and Yoav Freund. 2008. Random Projection Trees and Low Dimensional Manifolds. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing (STOC ’08). ACM, 537–546

work page 2008
[10]

Sanjoy Dasgupta and Yoav Freund. 2009. Random Projection Trees for Vector Quantization. IEEE Transactions on Information Theory 55, 7 (2009), 3229–3242

work page 2009
[11]

Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh

work page
[12]

IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

work page 2019
[13]

Drineas, A

P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. 2004. Clustering large graphs via the singular value decomposition. Machine Learning 56, 1–3 (2004), 9–33

work page 2004
[14]

Steven Elsworth and Stefan Güttel. 2020. ABBA: adaptive Brownian bridge-based symbolic aggregation of time series. Data Mining and Knowledge Discovery 34 (2020), 1175–1200

work page 2020
[15]

Steven Elsworth and Stefan Güttel. 2020. Time series forecasting using LSTM networks: A symbolic approach. (2020), 12. arXiv:2003.05672

work page arXiv 2020
[16]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI Press, 226–231

work page 1996
[17]

Yifeng Gao and Jessica Lin. 2019. Discovering Subdimensional Motifs of Different Lengths in Large-Scale Multivariate Time Series. InIEEE International Conference on Data Mining. 220–229

work page 2019
[18]

Robert Gray. 1984. Vector quantization. IEEE ASSP Magazine 1, 2 (1984), 4–29

work page 1984
[19]

Keogh, J

E. Keogh, J. Lin, and A. Fu. 2005. HOT SAX: efficiently finding the most un- usual time series subsequence. In IEEE International Conference on Data Mining (ICDM’05). 1–8

work page 2005
[20]

Eamonn Keogh, Stefano Lonardi, and Bill ’Yuan-chi’ Chiu. 2002. Finding Surpris- ing Patterns in a Time Series Database in Linear Time and Space. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’02). ACM, 550–556

work page 2002
[21]

Xiaosheng Li and Jessica Lin. 2017. Linear Time Complexity Time Series Classi- fication with Bag-of-Pattern-Features. In IEEE International Conference on Data Mining. 277–286

work page 2017
[22]

Xiaosheng Li, Jessica Lin, and Liang Zhao. 2021. Time Series Clustering in Linear Time Complexity. Data Mining and Knowledge Discovery 35, 6 (2021), 2369–2388

work page 2021
[23]

Yuan Li and Jessica Lin. 2010. Approximate Variable-Length Time Series Motif Discovery Using Grammar Inference. In Proceedings of the 10th International Workshop on Multimedia Data Mining (MDMKDD ’10). ACM, 9

work page 2010
[24]

Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. ACM, 2–11

work page 2003
[25]

Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. 2007. Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery 15, 2 (2007), 107–144

work page 2007
[26]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (2014), 740–755

work page 2014
[27]

Lkhagva, Yu Suzuki, and K

B. Lkhagva, Yu Suzuki, and K. Kawagoe. 2006. New Time Series Data Represen- tation ESAX for Financial Applications. In 22nd International Conference on Data Engineering Workshops (ICDEW’06). x115–x115. https://doi.org/10.1109/ICDEW. 2006.99

work page doi:10.1109/icdew 2006
[28]

S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Infor- mation Theory 28, 2 (1982), 129–137

work page 1982
[29]

Stuart P. Lloyd. 1982. Least squares quantization in PCM. Transactions on Information Theory 28 (1982), 129–137

work page 1982
[30]

Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. 2012. The planar k-means problem is NP-hard. Theoretical Computer Science 442 (2012), 13–21. Special Issue on the Workshop on Algorithms and Computation (WALCOM 2009)

work page 2012
[31]

Simon Malinowski, Thomas Guyet, René Quiniou, and Romain Tavenard. 2013. 1d-SAX: A Novel Symbolic Representation for Time Series. In Advances in Intel- ligent Data Analysis XII

work page 2013
[32]

Thach Le Nguyen and Georgiana Ifrim. 2023. Fast Time Series Classification with Random Symbolic Subsequences. InAdvanced Analytics and Learning on Temporal Data: 7th ECML PKDD Workshop, AALTD 2022, Grenoble, France, September 19–23, 2022, Revised Selected Papers . Springer, 50—-65

work page 2023
[33]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830

work page 2011
[34]

N. Ruta, N. Sawada, K. McKeough, M. Behrisch, and J. Beyer. 2019. SAX Nav- igator: Time Series Exploration through Hierarchical Clustering. In 2019 IEEE Visualization Conference. IEEE, 236–240

work page 2019
[35]

Boedi- hardjo, Crystal Chen, and Susan Frankenstein

Pavel Senin, Jessica Lin, Xing Wang, Tim Oates, Sunil Gandhi, Arnold P. Boedi- hardjo, Crystal Chen, and Susan Frankenstein. 2015. Time series anomaly dis- covery with grammar-based compression.. In 18th International Conference on Extending Database Technology. OpenProceedings.org, 481–492

work page 2015
[36]

Pavel Senin and Sergey Malinchik. 2013. SAX-VSM: Interpretable Time Se- ries Classification Using SAX and Vector Space Model. In IEEE International Conference on Data Mining . 1175–1180

work page 2013
[37]

Yu and Jianbo Shi

Stella X. Yu and Jianbo Shi. 2003. Multiclass spectral clustering. In Proceedings of the Ninth IEEE International Conference on Computer Vision , Vol. 2. IEEE, 313

work page 2003
[38]

Yuncong Yu, Tim Becker, Le Minh Trinh, and Michael Behrisch. 2023. SAXRegEx: Multivariate time series pattern search with symbolic representation, regular expression, and query expansion. Computers & Graphics 112 (2023), 13–21

work page 2023
[39]

Shengdong Zhang, Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, and Mohak Shah. 2017. Deep learning on symbolic representations for large-scale heterogeneous time-series event prediction. In2017 IEEE International Conference on Acoustics, Speech and Signal Processing . 5970–5974

work page 2017
[40]

Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: An efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data . ACM, 103–114

work page 1996

[1] [1]

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Ragha- van. 1998. Automatic subspace clustering of high dimensional data for data mining applications. Proceedings of the ACM SIGMOD International Conference on Management of Data 27 (1998), 94–105

work page 1998

[2] [2]

2007.k-means++: the advantages of careful seeding

David Arthur and Sergei Vassilvitskii. 2007.k-means++: the advantages of careful seeding. In SODA ’07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 1027– 1035

work page 2007

[3] [3]

Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J

Anthony J. Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn J. Keogh. 2018. The UEA multivariate time series classification archive. CoRR (2018)

work page 2018

[4] [4]

Konstantinos Bountrogiannis, George Tzagkarakis, and Panagiotis Tsakalides

work page

[5] [5]

IEEE Transactions on Knowledge and Data Engineering 35, 6 (2023), 5752–5766

Distribution Agnostic Symbolic Representations for Time Series Dimension- ality Reduction and Online Anomaly Detection. IEEE Transactions on Knowledge and Data Engineering 35, 6 (2023), 5752–5766

work page 2023

[6] [6]

Ricardo J. G. B. Campello, Davoud Moulavi, and Joerg Sander. 2013. Density- Based Clustering Based on Hierarchical Density Estimates. InAdvances in Knowl- edge Discovery and Data Mining . Springer, 160–172

work page 2013

[7] [7]

Xinye Chen and Stefan Güttel. 2022. An Efficient Aggregation Method for the Symbolic Representation of Temporal Data. ACM Transactions on Knowledge Discovery from Data (2022)

work page 2022

[8] [8]

Xinye Chen and Stefan Güttel. 2022. Fast and explainable clustering based on sorting. (2022), 25. arXiv:2202.01456

work page arXiv 2022

[9] [9]

Sanjoy Dasgupta and Yoav Freund. 2008. Random Projection Trees and Low Dimensional Manifolds. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing (STOC ’08). ACM, 537–546

work page 2008

[10] [10]

Sanjoy Dasgupta and Yoav Freund. 2009. Random Projection Trees for Vector Quantization. IEEE Transactions on Information Theory 55, 7 (2009), 3229–3242

work page 2009

[11] [11]

Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh

work page

[12] [12]

IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305

work page 2019

[13] [13]

Drineas, A

P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. 2004. Clustering large graphs via the singular value decomposition. Machine Learning 56, 1–3 (2004), 9–33

work page 2004

[14] [14]

Steven Elsworth and Stefan Güttel. 2020. ABBA: adaptive Brownian bridge-based symbolic aggregation of time series. Data Mining and Knowledge Discovery 34 (2020), 1175–1200

work page 2020

[15] [15]

Steven Elsworth and Stefan Güttel. 2020. Time series forecasting using LSTM networks: A symbolic approach. (2020), 12. arXiv:2003.05672

work page arXiv 2020

[16] [16]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density- Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI Press, 226–231

work page 1996

[17] [17]

Yifeng Gao and Jessica Lin. 2019. Discovering Subdimensional Motifs of Different Lengths in Large-Scale Multivariate Time Series. InIEEE International Conference on Data Mining. 220–229

work page 2019

[18] [18]

Robert Gray. 1984. Vector quantization. IEEE ASSP Magazine 1, 2 (1984), 4–29

work page 1984

[19] [19]

Keogh, J

E. Keogh, J. Lin, and A. Fu. 2005. HOT SAX: efficiently finding the most un- usual time series subsequence. In IEEE International Conference on Data Mining (ICDM’05). 1–8

work page 2005

[20] [20]

Eamonn Keogh, Stefano Lonardi, and Bill ’Yuan-chi’ Chiu. 2002. Finding Surpris- ing Patterns in a Time Series Database in Linear Time and Space. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’02). ACM, 550–556

work page 2002

[21] [21]

Xiaosheng Li and Jessica Lin. 2017. Linear Time Complexity Time Series Classi- fication with Bag-of-Pattern-Features. In IEEE International Conference on Data Mining. 277–286

work page 2017

[22] [22]

Xiaosheng Li, Jessica Lin, and Liang Zhao. 2021. Time Series Clustering in Linear Time Complexity. Data Mining and Knowledge Discovery 35, 6 (2021), 2369–2388

work page 2021

[23] [23]

Yuan Li and Jessica Lin. 2010. Approximate Variable-Length Time Series Motif Discovery Using Grammar Inference. In Proceedings of the 10th International Workshop on Multimedia Data Mining (MDMKDD ’10). ACM, 9

work page 2010

[24] [24]

Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. ACM, 2–11

work page 2003

[25] [25]

Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. 2007. Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery 15, 2 (2007), 107–144

work page 2007

[26] [26]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (2014), 740–755

work page 2014

[27] [27]

Lkhagva, Yu Suzuki, and K

B. Lkhagva, Yu Suzuki, and K. Kawagoe. 2006. New Time Series Data Represen- tation ESAX for Financial Applications. In 22nd International Conference on Data Engineering Workshops (ICDEW’06). x115–x115. https://doi.org/10.1109/ICDEW. 2006.99

work page doi:10.1109/icdew 2006

[28] [28]

S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Infor- mation Theory 28, 2 (1982), 129–137

work page 1982

[29] [29]

Stuart P. Lloyd. 1982. Least squares quantization in PCM. Transactions on Information Theory 28 (1982), 129–137

work page 1982

[30] [30]

Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. 2012. The planar k-means problem is NP-hard. Theoretical Computer Science 442 (2012), 13–21. Special Issue on the Workshop on Algorithms and Computation (WALCOM 2009)

work page 2012

[31] [31]

Simon Malinowski, Thomas Guyet, René Quiniou, and Romain Tavenard. 2013. 1d-SAX: A Novel Symbolic Representation for Time Series. In Advances in Intel- ligent Data Analysis XII

work page 2013

[32] [32]

Thach Le Nguyen and Georgiana Ifrim. 2023. Fast Time Series Classification with Random Symbolic Subsequences. InAdvanced Analytics and Learning on Temporal Data: 7th ECML PKDD Workshop, AALTD 2022, Grenoble, France, September 19–23, 2022, Revised Selected Papers . Springer, 50—-65

work page 2023

[33] [33]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830

work page 2011

[34] [34]

N. Ruta, N. Sawada, K. McKeough, M. Behrisch, and J. Beyer. 2019. SAX Nav- igator: Time Series Exploration through Hierarchical Clustering. In 2019 IEEE Visualization Conference. IEEE, 236–240

work page 2019

[35] [35]

Boedi- hardjo, Crystal Chen, and Susan Frankenstein

Pavel Senin, Jessica Lin, Xing Wang, Tim Oates, Sunil Gandhi, Arnold P. Boedi- hardjo, Crystal Chen, and Susan Frankenstein. 2015. Time series anomaly dis- covery with grammar-based compression.. In 18th International Conference on Extending Database Technology. OpenProceedings.org, 481–492

work page 2015

[36] [36]

Pavel Senin and Sergey Malinchik. 2013. SAX-VSM: Interpretable Time Se- ries Classification Using SAX and Vector Space Model. In IEEE International Conference on Data Mining . 1175–1180

work page 2013

[37] [37]

Yu and Jianbo Shi

Stella X. Yu and Jianbo Shi. 2003. Multiclass spectral clustering. In Proceedings of the Ninth IEEE International Conference on Computer Vision , Vol. 2. IEEE, 313

work page 2003

[38] [38]

Yuncong Yu, Tim Becker, Le Minh Trinh, and Michael Behrisch. 2023. SAXRegEx: Multivariate time series pattern search with symbolic representation, regular expression, and query expansion. Computers & Graphics 112 (2023), 13–21

work page 2023

[39] [39]

Shengdong Zhang, Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, and Mohak Shah. 2017. Deep learning on symbolic representations for large-scale heterogeneous time-series event prediction. In2017 IEEE International Conference on Acoustics, Speech and Signal Processing . 5970–5974

work page 2017

[40] [40]

Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: An efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data . ACM, 103–114

work page 1996