pith. sign in

arxiv: 1907.00700 · v1 · pith:ICJLE4EInew · submitted 2019-06-28 · 💻 cs.LG · stat.ML

An Improvement of PAA on Trend-Based Approximation for Time Series

Pith reviewed 2026-05-25 13:54 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords time seriesPAApiecewise aggregate approximationtrend approximationdimension reductionsimilarity searchclassificationanomaly detection
0
0 comments X

The pith

Two trend-capturing extensions to PAA maintain its lower bound while raising accuracy on classification and anomaly detection tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the limitation of Piecewise Aggregate Approximation (PAA) in missing trend information in time series data. It introduces two methods: one that splits each segment at the mean and records separate averages for the upper and lower parts, and another that uses a binary string to indicate trend direction relative to the mean. The first method is proven to satisfy the lower bound condition, ensuring no false dismissals in similarity searches. Experiments on classification and anomaly detection tasks demonstrate gains in accuracy and effectiveness from better trend extraction.

Core claim

We propose two new approaches for time series that utilize approximate trend feature information. Our first method is based on relative mean value of each segment to record the trend, which divide each segment into two parts and use the numerical average respectively to represent the trend. We proved that this method satisfies lower bound which guarantee no false dismissals. Our second method uses a binary string to record the trend which is also relative to mean in each segment. Our methods are applied on similarity measurement in classification and anomaly detection, the experimental results show the improvement of accuracy and effectiveness by extracting the trend feature suitably.

What carries the argument

Trend feature recording via split-segment averages (for the first method) or binary strings (for the second), applied to each PAA segment relative to its mean, to capture directional information missed by standard averaging.

If this is right

  • The split-segment method guarantees no false dismissals in similarity searches due to the proven lower bound.
  • Both methods can be applied to improve performance in time series classification tasks.
  • Both methods enhance effectiveness in anomaly detection applications.
  • Extracting trend features suitably leads to measurable gains in accuracy and computational efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The binary string approach may offer additional compression benefits beyond the split-average method.
  • These trend extensions could be tested on other distance measures to check if the lower bound holds more broadly.
  • The methods might combine with slope or other segment features for further task-specific gains.

Load-bearing premise

That recording trend via split-segment averages or binary strings relative to each segment mean is sufficient to improve task performance while the lower-bound property continues to hold for the distance measures used in practice.

What would settle it

A similarity search experiment where the new methods return more false dismissals than standard PAA, or classification experiments showing no accuracy improvement over PAA.

Figures

Figures reproduced from arXiv: 1907.00700 by Ao Yin, Chunkai Zhang, Keli Zhang, Xing Zhang, Yingyang Chen, Zhen Qin, Zoe L. Jiang.

Figure 1
Figure 1. Figure 1: PAA representation for one of the time series in ecg200. In this case, Raw represents for the original time series and PAA represents for the transformed time series, and the length of the time series is 96. when two sequence have same mean value while the trends are different. For PAA method, the distance measure was proposed as Equation(2). Dist(Q, ¯ P¯) = r n w vuutXw i=1 ( ¯pi − q¯i) 2 (2) ED(Q, P) = v… view at source ↗
Figure 2
Figure 2. Figure 2: The comparison of one segment in two time series which have very close mean value, the mean value of ts1 is 1.012 and ts2 is 1.01. To illustrate that t1 and t2 are different while through the PAA distance calculation, they are similar. Furthermore, PAA is an approximate method to fit the original sequence, so the maximum and minimum value will be missed. To address above problems, we propose two methods to… view at source ↗
Figure 3
Figure 3. Figure 3: up mean and below mean in one time series segment. We define up difference as the difference of all time points in one segment above the mean value, while below difference as the difference of all the time points in one segment below the mean value. Therefore the up-mean value ∆qu and below-mean value ∆qb which are relative to the mean value in each segment can be defined as: ∆qui = 1 ui nXw i k= n w (i−1)… view at source ↗
Figure 4
Figure 4. Figure 4: The trend representation. Pi and Qi represent for one of the corresponding segment in time series P and Q, and the dotted line is the mean value. In 4(a), the mean value is 0.8375 and binary string is BP i = 0110000, in 4(b), the mean value is 1.6714 and binary string is BQi = 0100110 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The comparison of tightness between NT PAA and PAA From this Fig.5 we can find that when the reduction ration is 1, the tight￾ness is equal, and as the reduction ratio becomes bigger, the tightness becomes smaller. 4.4 Comparison on Classification In this experiment, our proposed methods BT PAA and NT PAA are compared with three other distance measurements, Cosin [5], Euclidean [?] and PAA, with 24 data se… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of error rate between our proposed methods(NT PAA and BT PAA) and other methods(PAA and ED) with 24 data sets. The red dots in below region represent that our method is superior to the existing one, the blue triangles in up region represent that existing methods are better than ours, and the green squares represent the equal error rate. 4.5 Comparison on Anomaly Detection In this experiment, we … view at source ↗
Figure 7
Figure 7. Figure 7: The computation time of different time series with different s ranging from 2 to 10 in anomaly detection [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

Piecewise Aggregate Approximation (PAA) is a competitive basic dimension reduction method for high-dimensional time series mining. When deployed, however, the limitations are obvious that some important information will be missed, especially the trend. In this paper, we propose two new approaches for time series that utilize approximate trend feature information. Our first method is based on relative mean value of each segment to record the trend, which divide each segment into two parts and use the numerical average respectively to represent the trend. We proved that this method satisfies lower bound which guarantee no false dismissals. Our second method uses a binary string to record the trend which is also relative to mean in each segment. Our methods are applied on similarity measurement in classification and anomaly detection, the experimental results show the improvement of accuracy and effectiveness by extracting the trend feature suitably.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper proposes two extensions to Piecewise Aggregate Approximation (PAA) that incorporate trend information for time-series representation. The first splits each segment at its mean and records the two sub-segment averages; the authors state that this representation satisfies a lower bound guaranteeing no false dismissals. The second encodes trend via a binary string relative to each segment mean. Both representations are evaluated on similarity-based classification and anomaly detection, with reported gains in accuracy and effectiveness over standard PAA.

Significance. If the lower-bound property holds for the first method and the reported accuracy gains are reproducible with standard baselines and statistical controls, the work supplies a lightweight, trend-aware refinement of PAA that preserves the key indexing guarantee while addressing a known limitation of the original technique.

minor comments (3)
  1. The abstract states that the split-segment method 'satisfies lower bound' but does not name the distance measure or the precise lower-bound inequality; the manuscript should state the exact lower-bound relation (e.g., D_trend(P,Q) ≤ D(P,Q)) and the section containing its proof.
  2. The experimental claims rest on 'improvement of accuracy and effectiveness'; the paper should report the concrete distance measure used with each new representation, the number of datasets, and whether significance testing was performed.
  3. It is unclear whether the binary-string method is also claimed to obey a lower bound; if not, the manuscript should explicitly delimit the scope of the lower-bound guarantee.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful summary of our work and the recommendation for minor revision. We are pleased that the potential utility of the trend-aware extensions to PAA, including the lower-bound guarantee for the first method, is recognized.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claims are (1) a mathematical proof that the first proposed representation (relative mean values per split segment) satisfies a lower bound on distance that guarantees no false dismissals, and (2) empirical accuracy/effectiveness gains when the trend-augmented representations are used for classification and anomaly detection. The lower-bound result is presented as following directly from the definition of the distance measure and the segment-wise averaging construction; it is not obtained by fitting parameters to data and then relabeling the fit as a prediction. No load-bearing self-citations, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation are described. The experimental results are reported as observed outcomes of applying the new representations rather than as statistically forced predictions. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone; the central claims rest on the validity of the claimed lower-bound proof for the split-average method and on the empirical gains observed in the reported experiments. No free parameters, new entities, or additional axioms are described.

axioms (1)
  • domain assumption The proposed trend approximation methods satisfy the lower bounding property for distance measures in time series similarity search.
    Explicitly claimed as proved for the first method in the abstract.

pith-pipeline@v0.9.0 · 5681 in / 1156 out tokens · 38508 ms · 2026-05-25T13:54:39.415008+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    In: ACM SIGMOD International Conference on Management of Data

    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM SIGMOD International Conference on Management of Data. pp. 93–104 (2000)

  2. [2]

    Mea- surement Science & Technology 12(12), 2211 (2001)

    Cantrell, C.D.: Modern mathematical methods for physicists and engineers. Mea- surement Science & Technology 12(12), 2211 (2001)

  3. [3]

    In: International Conference on Data Engineering, 1999

    Chan, K.P., Fu, W.C.: Efficient time series matching by wavelets. In: International Conference on Data Engineering, 1999. Proceedings. pp. 126–133 (1999)

  4. [4]

    Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The ucr time series classification archive (July 2015), www.cs.ucr.edu/~eamonn/time_ series_data/

  5. [5]

    In: Inter- national Conference on Industrial Application Engineering

    Chomboon, K., Chujai, P., Teerarassammee, P., Kerdprasop, K., Kerdprasop, N.: An empirical study of distance metrics for k-nearest neighbor algorithm. In: Inter- national Conference on Industrial Application Engineering. pp. 280–285 (2015)

  6. [6]

    International Journal of Computer Vision 46(2), 103–128 (2002)

    Dersch, D.R., Dersch, D.R., Leinsinger, G.L., Hahn, K., Auer, D.: Cluster analysis of biomedical image time-series. International Journal of Computer Vision 46(2), 103–128 (2002)

  7. [7]

    international conference on management of data 23(2), 419–429 (1994)

    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. international conference on management of data 23(2), 419–429 (1994)

  8. [8]

    In: International Conference on Knowl- edge Science, Engineering and Management

    Guo, C., Li, H., Pan, D.: An improved piecewise aggregate approximation based on statistical features for time series mining. In: International Conference on Knowl- edge Science, Engineering and Management. pp. 234–244 (2010)

  9. [9]

    Neuroimage 22(3), 1214–1222 (2004) An Improvement of PAA on Trend-based Approximation for Time Series 15

    Himberg, J., Hyvrinen, A., Esposito, F.: Validating the independent components of neuroimaging time series via clustering and visualization. Neuroimage 22(3), 1214–1222 (2004) An Improvement of PAA on Trend-based Approximation for Time Series 15

  10. [10]

    Springerplus 5(1), 1304 (2016)

    Hu, L.Y., Huang, M.W., Ke, S.W., Tsai, C.F.: The distance function effect on k- nearest neighbor classification for medical datasets. Springerplus 5(1), 1304 (2016)

  11. [11]

    In: International Conference on Data Engineering, 2001

    Kahveci, T., Singh, A.: Variable length queries for time series data. In: International Conference on Data Engineering, 2001. Proceedings. p. 273 (2002)

  12. [12]

    Knowledge & Information Systems 3(3), 263–286 (2001)

    Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowledge & Information Systems 3(3), 263–286 (2001)

  13. [13]

    IEEE Trans- actions on Visualization & Computer Graphics 22(1), 11–20 (2016)

    Landesberger, T.V., Brodkorb, F., Roskosch, P.: Mobilitygraphs: Visual analysis of mass mobility dynamics via spatia-temporal graphs and clustering. IEEE Trans- actions on Visualization & Computer Graphics 22(1), 11–20 (2016)

  14. [14]

    In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery

    Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time se- ries, with implications for streaming algorithms. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. pp. 2–11 (2003)

  15. [15]

    ACM (2016)

    Paparrizos, J., Gravano, L.: k-Shape: Efficient and Accurate Clustering of Time Series. ACM (2016)

  16. [16]

    Rabiner, L., Juang, B.H.: Fundamentals of speech recognition 1(1), 353–356 (1993)

  17. [17]

    Rodriguez, A.C., Mozos, M.R.D.L.: Improving network security through traffic log anomaly detection using time series analysis. In: Computational Intelligence in Se- curity for Information Systems 2010 - Proceedings of the International Conference on Computational Intelligence in Security for Information Systems. pp. 125–133 (2010)

  18. [18]

    In: Conference on Genetic and Evolutionary Computation

    Rui, N., Horta, N.: A new sax-ga methodology applied to investment strategies optimization. In: Conference on Genetic and Evolutionary Computation. pp. 1055– 1062 (2012)

  19. [19]

    In: ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining

    Shokoohi-Yekta, M., Chen, Y., Campana, B., Hu, B., Zakaria, J., Keogh, E.: Dis- covery of meaningful rules in time series. In: ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining. pp. 1085–1094 (2015)

  20. [20]

    In: ACM International Conference on Management of Data

    Storer, N., Storer, N., Storer, N., Storer, N., Storer, N.: Littletable: A time-series database and its uses. In: ACM International Conference on Management of Data. pp. 125–138 (2017)

  21. [21]

    Neurocomputing138(11), 189–198 (2014)

    Sun, Y., Li, J., Liu, J., Sun, B., Chow, C.: An improvement of symbolic aggregate approximation distance measure for time series. Neurocomputing138(11), 189–198 (2014)

  22. [22]

    In: International Conference

    Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: International Conference. pp. 1033– 1040 (2006)

  23. [23]

    In: Proceedings of the 26th International Conference on Very Large Data Bases

    Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: Proceedings of the 26th International Conference on Very Large Data Bases. pp. 385–394 (2000)

  24. [24]

    In: International Conference on Compu- tational Intelligence and Security

    Yong, Z., Tan, X., Xi, H.: A novel approach to network security situation aware- ness based on multi-perspective analysis. In: International Conference on Compu- tational Intelligence and Security. pp. 768–772 (2007)

  25. [25]

    International Journal of Distributed Sensor Networks 2016, 1–9 (2016)

    Yu, Q., Jibin, L., Jiang, L.: An improved arima-based traffic anomaly detection algorithm for wireless sensor networks. International Journal of Distributed Sensor Networks 2016, 1–9 (2016)

  26. [26]

    Chinese Journal of Network & Information Security (2017)

    Zhang, C., Ao, Y., Liu, H., Zhang, J.: Design and application of electrocardio- graph diagnosis system based on multifractal theory. Chinese Journal of Network & Information Security (2017)

  27. [27]

    In: Cloud Computing – CLOUD 2018

    Zhang, C., Yin, A., Deng, Y., Tian, P., Wang, X., Dong, L.: A novel anomaly detection algorithm based on trident tree. In: Cloud Computing – CLOUD 2018. pp. 295–306 (2018)