pith. sign in

arxiv: 2604.26081 · v1 · submitted 2026-04-28 · 💻 cs.NI

On the Role of Time Series Clustering in Traffic Matrix Prediction

Pith reviewed 2026-05-07 14:24 UTC · model grok-4.3

classification 💻 cs.NI
keywords time series clusteringtraffic matrix predictionnetwork traffic forecastingAbilene datasetGÉANT datasetforecasting accuracyheterogeneous time series
0
0 comments X

The pith

Clustering traffic flows by their time-series behavior improves traffic matrix prediction accuracy over global models while staying much cheaper than predicting each flow separately.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes grouping network traffic flows into clusters based on shared temporal patterns and training separate predictors for each group rather than one model for the entire traffic matrix or one model per flow. Experiments on the Abilene and GÉANT datasets demonstrate that this approach consistently lowers prediction error relative to a single global forecaster. Most of the accuracy gain appears once a moderate number of clusters is reached, after which adding more clusters yields diminishing returns. Different ways of representing flows for clustering (histograms, autocorrelation, spectral density, or naive splits) produce different groupings yet deliver comparable final error rates. The central benefit therefore stems from breaking the heterogeneous prediction task into smaller, more uniform subproblems.

Core claim

By partitioning the flows of a traffic matrix into clusters according to histogram, autocorrelation function, power spectral density, or naive representations and fitting dedicated predictors inside each cluster, the resulting forecasts achieve lower root mean squared error than a single global model applied to all flows jointly, while incurring substantially lower computational cost than training an independent predictor for every individual flow.

What carries the argument

Clustering-based prediction framework that partitions flows using one of four representations (histogram, ACF, PSD, naive) and trains separate forecasters per cluster.

If this is right

  • Clustering yields most of its RMSE reduction at moderate cluster counts K, after which further increases produce only small additional gains.
  • Different clustering representations create dissimilar partitions of the flows yet reach nearly identical overall RMSE values.
  • The method remains substantially cheaper than fully local per-flow prediction while outperforming the global baseline.
  • The primary advantage arises from task decomposition rather than from the precise membership of any particular cluster.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition strategy could be tested on other heterogeneous multivariate time-series forecasting problems outside network traffic, such as electricity demand across regions or sensor streams from industrial plants.
  • If the main benefit is decomposition, then simpler or cheaper partitioning heuristics might substitute for the four examined representations without harming accuracy.
  • The observed plateau in gains at moderate K suggests an optimal cluster count could be chosen automatically by monitoring validation error rather than by exhaustive search.

Load-bearing premise

Traffic flows inside a single traffic matrix behave heterogeneously enough that a single joint forecaster loses accuracy compared with cluster-specific forecasters.

What would settle it

On the Abilene or GÉANT datasets, a global forecaster that matches or beats the RMSE of every clustered model at the same computational budget.

Figures

Figures reproduced from arXiv: 2604.26081 by Alexander M. Wyglinski, Charlotte Fowler, Martha Cash.

Figure 1
Figure 1. Figure 1: Example traffic flows from different source-destinat view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed cluster-based traffic matri view at source ↗
Figure 3
Figure 3. Figure 3: K-sweep results for each traffic-flow representation. Each panel shows normalized RMSE and runtime as functions of the number of clusters, K. The dotted vertical line indicates the selected value of K used in subsequent experiments. Specific K values are detailed in Table I. and repeat each sweep five times view at source ↗
Figure 4
Figure 4. Figure 4: Representative traffic-flow predictions for the Abil view at source ↗
read the original abstract

This paper analyzes the role of time-series clustering in traffic matrix (TM) prediction. Traffic flows within a TM often exhibit heterogeneous behavior, which can reduce the effectiveness of global forecasting models that predict all flows jointly. To address this, we propose a clustering-based prediction framework that groups flows into smaller subsets and trains separate predictors for each group. Four traffic-flow representations for clustering are explored, namely, histogram, autocorrelation function (ACF), power spectral density (PSD), and na\"ive partitioning, and how the representation choice and the number of clusters affect prediction performance. Experiments using the publicly available Abilene and G\'EANT datasets show that clustering consistently improves over global forecasting baselines, while remaining substantially less costly than local prediction. The results further show that most of the performance gain is achieved at moderate values of K, with diminishing returns as the number of clusters increases. Although different clustering representations produce different partitions of the traffic flows, they often achieve similar root mean squared error (RMSE). This suggests that the main benefit of clustering lies in decomposing the TM prediction task into smaller subproblems, while the exact cluster structure plays a more limited role in determining overall prediction accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes the role of time series clustering in traffic matrix (TM) prediction. It proposes a framework that groups flows into clusters using four representations (histogram, autocorrelation function, power spectral density, and naïve partitioning), then trains separate predictors per cluster. Experiments on the Abilene and GÉANT datasets claim that clustering yields consistent RMSE improvements over global forecasting baselines at substantially lower cost than local per-flow prediction, with most gains achieved at moderate values of K and diminishing returns thereafter; different representations produce different partitions but often similar RMSE.

Significance. If the experimental protocol is free of data leakage, the results would indicate that clustering provides a practical middle ground for TM prediction by decomposing heterogeneous flows into smaller subproblems, delivering accuracy gains over global models without the full overhead of local models. The finding that representation choice has limited impact on final RMSE would further suggest that the benefit is primarily from task decomposition rather than cluster structure.

major comments (2)
  1. [Experimental protocol / Results] The experimental protocol (described in the methods and results sections) does not explicitly state that clustering features (ACF, PSD, histograms) are computed only on training windows. If these representations are derived from the full time series, cluster assignments encode test-period statistics, introducing leakage that turns the per-cluster predictors into partially supervised models and inflates the reported RMSE gains relative to true causal baselines.
  2. [Results] The claim of 'consistent improvements' over global baselines (abstract and §5) is not accompanied by details on the exact baseline models, train/test split ratios, presence of error bars, or statistical significance tests on RMSE differences. This prevents verification of whether the gains are robust or merely within noise.
minor comments (2)
  1. [Abstract] The abstract refers to 'global forecasting baselines' without naming the models or providing a brief citation; adding this would improve readability.
  2. [Results] A summary table of RMSE values across K and representations (currently described only in text) would allow readers to directly compare the diminishing-returns observation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments identify key areas where the experimental protocol and results presentation can be strengthened for clarity and rigor. We address each major comment below and indicate the revisions that will be incorporated into the next version of the manuscript.

read point-by-point responses
  1. Referee: The experimental protocol (described in the methods and results sections) does not explicitly state that clustering features (ACF, PSD, histograms) are computed only on training windows. If these representations are derived from the full time series, cluster assignments encode test-period statistics, introducing leakage that turns the per-cluster predictors into partially supervised models and inflates the reported RMSE gains relative to true causal baselines.

    Authors: We appreciate this observation. In the experiments, all clustering representations (histogram, ACF, and PSD) were computed exclusively on the training windows of each time series, with cluster assignments determined prior to predictor training and held fixed during evaluation. This ensures no information from the test period influences the clustering or subsequent predictions. However, we acknowledge that the manuscript does not state this explicitly. We will revise the methods section to include a clear statement that feature extraction for clustering is performed solely on training data, along with pseudocode or a diagram illustrating the temporal separation. revision: yes

  2. Referee: The claim of 'consistent improvements' over global baselines (abstract and §5) is not accompanied by details on the exact baseline models, train/test split ratios, presence of error bars, or statistical significance tests on RMSE differences. This prevents verification of whether the gains are robust or merely within noise.

    Authors: We agree that these details are necessary for full reproducibility and assessment of robustness. The global baselines consist of the identical forecasting architectures (ARIMA, Prophet, LSTM, and Transformer variants) trained jointly on all flows without clustering. The evaluation uses a 70/30 chronological train/test split with a rolling-window protocol (window size 1000 time steps, stride 100). We will augment §5 with a table reporting mean RMSE, standard deviation across 5 independent runs (different random seeds for model initialization and data shuffling within training), and p-values from paired Wilcoxon signed-rank tests comparing clustered vs. global RMSE per dataset and representation. These additions will also be reflected in the abstract and a new subsection on statistical analysis. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation only

full rationale

The paper reports an empirical comparison of clustering-based TM predictors against global baselines on the Abilene and GÉANT datasets. No mathematical derivation, equation, or first-principles result is presented that reduces the reported RMSE gains to a fitted parameter, self-referential quantity, or self-citation chain. Cluster assignments and per-cluster models are trained and evaluated as separate steps on real data; the performance differences are measured outcomes rather than identities by construction. Potential issues such as feature computation on full series constitute a methodological validity concern but do not match any enumerated circularity pattern and do not force the central claim to equal its inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that traffic flows are heterogeneous enough to benefit from decomposition and on standard time-series feature extraction techniques; no new entities are postulated and the only tunable element is the number of clusters K, which is varied experimentally rather than fitted to produce the result.

free parameters (1)
  • number of clusters K
    Varied across experiments to measure performance; not a single fitted constant but an explicit hyperparameter whose effect is reported.
axioms (1)
  • domain assumption Traffic flows within a traffic matrix exhibit heterogeneous behavior that reduces the effectiveness of a single global forecasting model
    Stated directly in the abstract as the motivation for clustering.

pith-pipeline@v0.9.0 · 5503 in / 1352 out tokens · 77044 ms · 2026-05-07T14:24:03.714983+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

  1. [1]

    Internet traffic matrices: A primer,

    P . Tune, M. Roughan, H. Haddadi, and O. Bonaventure, “Internet traffic matrices: A primer,” Recent Advances in Networking , vol. 1, pp. 1–56, 2013

  2. [2]

    Traffic matrices: Balancing measurements, inference and modeling,

    A. S. et al., “Traffic matrices: Balancing measurements, inference and modeling,” SIGMETRICS Performance Evaluation Review , vol. 33, no. 1, pp. 362–373, 2005

  3. [3]

    Neutm: A neural network- based framework for traffic matrix prediction in sdn,

    A. Azzouni and G. Pujolle, “Neutm: A neural network- based framework for traffic matrix prediction in sdn,” in Proc. IEEE/IFIP NOMS , 2018, pp. 1–5

  4. [4]

    Deep learn- ing based network traffic matrix prediction,

    D. Aloraifan, I. Ahmad, and E. Alrashed, “Deep learn- ing based network traffic matrix prediction,” Interna- tional Journal of Intelligent Networks , vol. 2, pp. 46–56, 2021

  5. [5]

    Internet traffic prediction with deep neural networks,

    W . Jiang, “Internet traffic prediction with deep neural networks,” Internet T echnology Letters , vol. 5, no. 2, e314, 2022

  6. [6]

    Principles and algorithms for forecasting groups of time series: Lo- cality and globality,

    P . Montero-Manso and R. J. Hyndman, “Principles and algorithms for forecasting groups of time series: Lo- cality and globality,” arXiv preprint arXiv:2008.00444 ,

  7. [7]

    Available: https://arxiv.org/abs/2008

    [Online]. Available: https://arxiv.org/abs/2008. 00444

  8. [8]

    Forecast- ing across time series databases using long short-term memory networks on groups of similar series,

    K. Bandara, C. Bergmeir, and S. Smyl, “Forecast- ing across time series databases using long short-term memory networks on groups of similar series,” CoRR, vol. abs/1710.03222, 2017. [Online]. Available: http : //arxiv.org/abs/1710.03222

  9. [9]

    Zhang, Abilene network traffic matrices , https://www

    Y . Zhang, Abilene network traffic matrices , https://www. cs.utexas.edu/∼ yzhang/research/AbileneTM/, Accessed: Jan. 16, 2025, 2004. 11

  10. [10]

    Incorporating intra-flow dependencies and inter-flow correlations for traffic matrix prediction,

    K. Gao et al., “Incorporating intra-flow dependencies and inter-flow correlations for traffic matrix prediction,” in Proc. IEEE IWQoS , 2020, pp. 1–10

  11. [11]

    Internet traffic matrix prediction with con- volutional lstm neural network,

    W . Jiang, “Internet traffic matrix prediction with con- volutional lstm neural network,” Internet T echnology Letters, vol. 5, e322, 2022. DOI : 10.1002/itl2.322

  12. [12]

    Improving internet traffic matrix prediction via time series clustering,

    M. Cash and A. Wyglinski, “Improving internet traffic matrix prediction via time series clustering,” arXiv preprint arXiv:2509.15072 , 2025, Accepted to Interna- tional Conference on Machine Learning Applications,

  13. [13]

    Available: https://arxiv.org/abs/2509

    [Online]. Available: https://arxiv.org/abs/2509. 15072

  14. [14]

    Traffic prediction for dy- namic traffic engineering,

    T. Otoshi, Y . Ohsita, M. Murata, Y . Takahashi, K. Ishibashi, and K. Shiomoto, “Traffic prediction for dy- namic traffic engineering,” Computer Networks, vol. 85, pp. 36–50, 2015

  15. [15]

    Learning to route,

    A. V aladarsky, M. Schapira, D. Shahaf, and A. Tamar, “Learning to route,” in Proc. 16th ACM W orkshop on Hot T opics in Networks , 2017, pp. 185–191

  16. [16]

    Towards traffic matrix prediction with lstm recurrent neural networks,

    J. Zhao, H. Qu, J. Zhao, and D. Jiang, “Towards traffic matrix prediction with lstm recurrent neural networks,” Electronics Letters, vol. 54, no. 9, pp. 566–568, 2018

  17. [17]

    Deep learning-based traffic prediction for net- work optimization,

    S. Troia, R. Alvizu, Y . Zhou, G. Maier, and A. Pat- tavina, “Deep learning-based traffic prediction for net- work optimization,” in Proc. ICTON, 2018, pp. 1–4

  18. [18]

    Network traffic predic- tion using recurrent neural networks,

    N. Ramakrishnan and T. Soni, “Network traffic predic- tion using recurrent neural networks,” in Proc. IEEE ICMLA, 2018, pp. 187–193

  19. [19]

    An ai-based traffic matrix prediction solution for software- defined network,

    D.-H. Le, H.-A. Tran, S. Souihi, and A. Mellouk, “An ai-based traffic matrix prediction solution for software- defined network,” in Proc. IEEE ICC , 2021, pp. 1–6

  20. [20]

    Flow- by-flow traffic matrix prediction methods: Achieving accurate, adaptable, low cost results,

    W . Zheng, Y . Li, M. Hong, X. Fan, and G. Zhao, “Flow- by-flow traffic matrix prediction methods: Achieving accurate, adaptable, low cost results,” Computer Com- munications, vol. 194, pp. 348–360, 2022

  21. [21]

    Network Traffic Prediction Using PSO-LightGBM- TM,

    F. Li, W . Nie, K.-Y . Lam, B. Shen, and X. Li, “Network Traffic Prediction Using PSO-LightGBM- TM,” in IEEE INFOCOM 2024 - IEEE Conference on Computer Communications W orkshops (INFOCOM WKSHPS), May 2024, pp. 1–6. DOI : 10 . 1109 / INFOCOMWKSHPS61880 . 2024 . 10620828 Accessed: Apr. 1, 2025. [Online]. Available: https : / / ieeexplore . ieee.org/document/...

  22. [22]

    Pre- diction and correction of traffic matrix in an ip backbone network,

    W . Liu, A. Hong, L. Ou, W . Ding, and G. Zhang, “Pre- diction and correction of traffic matrix in an ip backbone network,” in Proc. IEEE IPCCC , 2014, pp. 1–9

  23. [23]

    Traffic matrix prediction based on deep learning for dynamic traffic engineering,

    Z. Liu, Z. Wang, X. Yin, X. Shi, Y . Guo, and Y . Tian, “Traffic matrix prediction based on deep learning for dynamic traffic engineering,” in Proc. IEEE ISCC, 2019, pp. 1–7

  24. [24]

    Time-series clustering - a decade review,

    S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, “Time-series clustering - a decade review,” Inf. Syst., vol. 53, no. C, pp. 16–38, Oct. 2015, ISSN : 0306-

  25. [25]

    Time-series clustering – A decade review , journal =

    DOI : 10 . 1016 / j . is . 2015 . 04 . 007 [Online]. Available: https://doi.org/10.1016/j.is.2015.04.007

  26. [26]

    Shapley values of reconstruction errors of pca for explaining anomaly detection,

    R. Ma and R. Angryk, “Distance and density clustering for time series data,” in 2017 IEEE International Con- ference on Data Mining W orkshops (ICDMW) , 2017, pp. 25–32. DOI : 10.1109/ICDMW .2017.11

  27. [27]

    Characteristic- based clustering for time series data,

    X. Wang, K. Smith, and R. Hyndman, “Characteristic- based clustering for time series data,” Data mining and knowledge Discovery, vol. 13, no. 3, pp. 335–364, 2006

  28. [28]

    Bridging the gap: A decade review of time-series clustering methods,

    J. Paparrizos, F. Y ang, and H. Li, “Bridging the gap: A decade review of time-series clustering methods,” arXiv preprint arXiv:2412.20582 , 2024. [Online]. Available: https://arxiv.org/abs/2412.20582

  29. [29]

    Forecasting histogram time se- ries with k-nearest neighbours methods,

    J. Arroyo and C. Mat´ e, “Forecasting histogram time se- ries with k-nearest neighbours methods,” International Journal of F orecasting , vol. 25, no. 1, pp. 192–207, 2009

  30. [30]

    Histogram-based cluster- ing of multiple data streams,

    A. Balzanella and R. V erde, “Histogram-based cluster- ing of multiple data streams,” Knowledge and Informa- tion Systems , vol. 62, no. 1, pp. 203–238, 2020

  31. [31]

    Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach,

    K. Bandara, C. Bergmeir, and S. Smyl, “Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach,” Expert systems with applications , vol. 140, p. 112 896, 2020

  32. [32]

    Feature-based classification of time-series data,

    A. Nanopoulos, R. Alcock, and Y . Manolopoulos, “Feature-based classification of time-series data,” Inter- national Journal of Computer Research , vol. 10, no. 3, pp. 49–61, 2001

  33. [33]

    K-shape: Efficient and accurate clustering of time series,

    J. Paparrizos and L. Gravano, “K-shape: Efficient and accurate clustering of time series,” in Proceedings of the 2015 ACM SIGMOD international conference on management of data , 2015, pp. 1855–1870

  34. [34]

    A hierarchical feature-based time series clustering approach for data-driven capacity planning of cellular networks,

    V . Jain, A. Richter, V . Fokow, M. Schweigel, U. Wet- zker, and A. Frotzscher, “A hierarchical feature-based time series clustering approach for data-driven capacity planning of cellular networks,” IEEE Transactions on Machine Learning in Communications and Networking , 2025

  35. [35]

    Application of agglomerative hi- erarchical clustering for clustering of time series data,

    A. Radovanovi´ c, J. Li, J. V . Milanovi´ c, N. Milosavl- jevi´ c, and R. Storchi, “Application of agglomerative hi- erarchical clustering for clustering of time series data,” in 2020 IEEE PES Innovative Smart Grid T echnologies Europe (ISGT-Europe) , 2020, pp. 640–644. DOI : 10 . 1109/ISGT-Europe47291.2020.9248759

  36. [36]

    A study of hierar- chical clustering algorithms,

    S. Patel, S. Sihmar, and A. Jatain, “A study of hierar- chical clustering algorithms,” in 2015 2nd International Conference on Computing for Sustainable Global De- velopment (INDIACom) , 2015, pp. 537–541

  37. [37]

    The impact of linkage meth- ods in hierarchical clustering for active learning to rank,

    Z. Li and M. de Rijke, “The impact of linkage meth- ods in hierarchical clustering for active learning to rank,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’17, Shinjuku, Tokyo, Japan: Association for Computing Machinery, 2017, pp. 941–944, ISBN : 9781450350228. DOI : 10 ....

  38. [38]

    Time series clustering using fragmented autocorrelations,

    A. Albino, J. Caiado, and N. Crato, “Time series clustering using fragmented autocorrelations,” Physica A: Statistical Mechanics and its Applications , vol. 650, p. 129 981, 2024. 12

  39. [39]

    Robust clustering for time series using spectral densities and functional data analysis,

    D. Rivera-Garc´ ıa, L. A. Garc´ ıa-Escudero, A. Mayo- Iscar, and J. Ortega, “Robust clustering for time series using spectral densities and functional data analysis,” in International W ork-Conference on Artificial Neural Networks, Springer, 2017, pp. 142–153

  40. [40]

    The jensen-shannon divergence,

    M. L. Men´ endez, J. A. Pardo, L. Pardo, and M. d. C. Pardo, “The jensen-shannon divergence,” Journal of the Franklin Institute, vol. 334, no. 2, pp. 307–318, 1997

  41. [41]

    Performance guarantees for hierarchical clustering,

    S. Dasgupta and P . M. Long, “Performance guarantees for hierarchical clustering,” Journal of Computer and System Sciences , vol. 70, no. 4, pp. 555–569, 2005

  42. [42]

    The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,

    P . Welch, “The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Transactions on Audio and Electroacoustics , vol. 15, no. 2, pp. 70–73, 1967. DOI : 10 . 1109 / TAU . 1967 . 1161901

  43. [43]

    Providing public intradomain traffic matrices to the research community,

    S. Uhlig, B. Quoitin, J. Lepropre, and S. Balon, “Providing public intradomain traffic matrices to the research community,” SIGCOMM Comput. Commun. Rev., vol. 36, no. 1, pp. 83–86, 2006

  44. [44]

    Prophet: Traffic engineering-centric traffic matrix prediction,

    Y . Zhang et al., “Prophet: Traffic engineering-centric traffic matrix prediction,” IEEE/ACM Transactions on Networking, 2023

  45. [45]

    Comparing partitions,

    L. Hubert and P . Arabie, “Comparing partitions,” Jour- nal of classification , vol. 2, no. 1, pp. 193–218, 1985

  46. [46]

    Cluster ensembles—a knowl- edge reuse framework for combining multiple parti- tions,

    A. Strehl and J. Ghosh, “Cluster ensembles—a knowl- edge reuse framework for combining multiple parti- tions,” Journal of machine learning research , vol. 3, no. Dec, pp. 583–617, 2002

  47. [47]

    Adam: A Method for Stochastic Optimization

    D. P . Kingma and J. Ba, “Adam: A method for stochas- tic optimization,” arXiv preprint arXiv:1412.6980 , 2014

  48. [48]

    Finding a ´kneedle

    V . Satopaa, J. Albrecht, D. Irwin, and B. Raghavan, “Finding a ´kneedle” in a haystack: Detecting knee points in system behavior,” in 2011 31st international conference on distributed computing systems work- shops, IEEE, 2011, pp. 166–171