On the Role of Time Series Clustering in Traffic Matrix Prediction
Pith reviewed 2026-05-07 14:24 UTC · model grok-4.3
The pith
Clustering traffic flows by their time-series behavior improves traffic matrix prediction accuracy over global models while staying much cheaper than predicting each flow separately.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By partitioning the flows of a traffic matrix into clusters according to histogram, autocorrelation function, power spectral density, or naive representations and fitting dedicated predictors inside each cluster, the resulting forecasts achieve lower root mean squared error than a single global model applied to all flows jointly, while incurring substantially lower computational cost than training an independent predictor for every individual flow.
What carries the argument
Clustering-based prediction framework that partitions flows using one of four representations (histogram, ACF, PSD, naive) and trains separate forecasters per cluster.
If this is right
- Clustering yields most of its RMSE reduction at moderate cluster counts K, after which further increases produce only small additional gains.
- Different clustering representations create dissimilar partitions of the flows yet reach nearly identical overall RMSE values.
- The method remains substantially cheaper than fully local per-flow prediction while outperforming the global baseline.
- The primary advantage arises from task decomposition rather than from the precise membership of any particular cluster.
Where Pith is reading between the lines
- The same decomposition strategy could be tested on other heterogeneous multivariate time-series forecasting problems outside network traffic, such as electricity demand across regions or sensor streams from industrial plants.
- If the main benefit is decomposition, then simpler or cheaper partitioning heuristics might substitute for the four examined representations without harming accuracy.
- The observed plateau in gains at moderate K suggests an optimal cluster count could be chosen automatically by monitoring validation error rather than by exhaustive search.
Load-bearing premise
Traffic flows inside a single traffic matrix behave heterogeneously enough that a single joint forecaster loses accuracy compared with cluster-specific forecasters.
What would settle it
On the Abilene or GÉANT datasets, a global forecaster that matches or beats the RMSE of every clustered model at the same computational budget.
Figures
read the original abstract
This paper analyzes the role of time-series clustering in traffic matrix (TM) prediction. Traffic flows within a TM often exhibit heterogeneous behavior, which can reduce the effectiveness of global forecasting models that predict all flows jointly. To address this, we propose a clustering-based prediction framework that groups flows into smaller subsets and trains separate predictors for each group. Four traffic-flow representations for clustering are explored, namely, histogram, autocorrelation function (ACF), power spectral density (PSD), and na\"ive partitioning, and how the representation choice and the number of clusters affect prediction performance. Experiments using the publicly available Abilene and G\'EANT datasets show that clustering consistently improves over global forecasting baselines, while remaining substantially less costly than local prediction. The results further show that most of the performance gain is achieved at moderate values of K, with diminishing returns as the number of clusters increases. Although different clustering representations produce different partitions of the traffic flows, they often achieve similar root mean squared error (RMSE). This suggests that the main benefit of clustering lies in decomposing the TM prediction task into smaller subproblems, while the exact cluster structure plays a more limited role in determining overall prediction accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes the role of time series clustering in traffic matrix (TM) prediction. It proposes a framework that groups flows into clusters using four representations (histogram, autocorrelation function, power spectral density, and naïve partitioning), then trains separate predictors per cluster. Experiments on the Abilene and GÉANT datasets claim that clustering yields consistent RMSE improvements over global forecasting baselines at substantially lower cost than local per-flow prediction, with most gains achieved at moderate values of K and diminishing returns thereafter; different representations produce different partitions but often similar RMSE.
Significance. If the experimental protocol is free of data leakage, the results would indicate that clustering provides a practical middle ground for TM prediction by decomposing heterogeneous flows into smaller subproblems, delivering accuracy gains over global models without the full overhead of local models. The finding that representation choice has limited impact on final RMSE would further suggest that the benefit is primarily from task decomposition rather than cluster structure.
major comments (2)
- [Experimental protocol / Results] The experimental protocol (described in the methods and results sections) does not explicitly state that clustering features (ACF, PSD, histograms) are computed only on training windows. If these representations are derived from the full time series, cluster assignments encode test-period statistics, introducing leakage that turns the per-cluster predictors into partially supervised models and inflates the reported RMSE gains relative to true causal baselines.
- [Results] The claim of 'consistent improvements' over global baselines (abstract and §5) is not accompanied by details on the exact baseline models, train/test split ratios, presence of error bars, or statistical significance tests on RMSE differences. This prevents verification of whether the gains are robust or merely within noise.
minor comments (2)
- [Abstract] The abstract refers to 'global forecasting baselines' without naming the models or providing a brief citation; adding this would improve readability.
- [Results] A summary table of RMSE values across K and representations (currently described only in text) would allow readers to directly compare the diminishing-returns observation.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments identify key areas where the experimental protocol and results presentation can be strengthened for clarity and rigor. We address each major comment below and indicate the revisions that will be incorporated into the next version of the manuscript.
read point-by-point responses
-
Referee: The experimental protocol (described in the methods and results sections) does not explicitly state that clustering features (ACF, PSD, histograms) are computed only on training windows. If these representations are derived from the full time series, cluster assignments encode test-period statistics, introducing leakage that turns the per-cluster predictors into partially supervised models and inflates the reported RMSE gains relative to true causal baselines.
Authors: We appreciate this observation. In the experiments, all clustering representations (histogram, ACF, and PSD) were computed exclusively on the training windows of each time series, with cluster assignments determined prior to predictor training and held fixed during evaluation. This ensures no information from the test period influences the clustering or subsequent predictions. However, we acknowledge that the manuscript does not state this explicitly. We will revise the methods section to include a clear statement that feature extraction for clustering is performed solely on training data, along with pseudocode or a diagram illustrating the temporal separation. revision: yes
-
Referee: The claim of 'consistent improvements' over global baselines (abstract and §5) is not accompanied by details on the exact baseline models, train/test split ratios, presence of error bars, or statistical significance tests on RMSE differences. This prevents verification of whether the gains are robust or merely within noise.
Authors: We agree that these details are necessary for full reproducibility and assessment of robustness. The global baselines consist of the identical forecasting architectures (ARIMA, Prophet, LSTM, and Transformer variants) trained jointly on all flows without clustering. The evaluation uses a 70/30 chronological train/test split with a rolling-window protocol (window size 1000 time steps, stride 100). We will augment §5 with a table reporting mean RMSE, standard deviation across 5 independent runs (different random seeds for model initialization and data shuffling within training), and p-values from paired Wilcoxon signed-rank tests comparing clustered vs. global RMSE per dataset and representation. These additions will also be reflected in the abstract and a new subsection on statistical analysis. revision: yes
Circularity Check
No significant circularity; empirical evaluation only
full rationale
The paper reports an empirical comparison of clustering-based TM predictors against global baselines on the Abilene and GÉANT datasets. No mathematical derivation, equation, or first-principles result is presented that reduces the reported RMSE gains to a fitted parameter, self-referential quantity, or self-citation chain. Cluster assignments and per-cluster models are trained and evaluated as separate steps on real data; the performance differences are measured outcomes rather than identities by construction. Potential issues such as feature computation on full series constitute a methodological validity concern but do not match any enumerated circularity pattern and do not force the central claim to equal its inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of clusters K
axioms (1)
- domain assumption Traffic flows within a traffic matrix exhibit heterogeneous behavior that reduces the effectiveness of a single global forecasting model
Reference graph
Works this paper leans on
-
[1]
Internet traffic matrices: A primer,
P . Tune, M. Roughan, H. Haddadi, and O. Bonaventure, “Internet traffic matrices: A primer,” Recent Advances in Networking , vol. 1, pp. 1–56, 2013
work page 2013
-
[2]
Traffic matrices: Balancing measurements, inference and modeling,
A. S. et al., “Traffic matrices: Balancing measurements, inference and modeling,” SIGMETRICS Performance Evaluation Review , vol. 33, no. 1, pp. 362–373, 2005
work page 2005
-
[3]
Neutm: A neural network- based framework for traffic matrix prediction in sdn,
A. Azzouni and G. Pujolle, “Neutm: A neural network- based framework for traffic matrix prediction in sdn,” in Proc. IEEE/IFIP NOMS , 2018, pp. 1–5
work page 2018
-
[4]
Deep learn- ing based network traffic matrix prediction,
D. Aloraifan, I. Ahmad, and E. Alrashed, “Deep learn- ing based network traffic matrix prediction,” Interna- tional Journal of Intelligent Networks , vol. 2, pp. 46–56, 2021
work page 2021
-
[5]
Internet traffic prediction with deep neural networks,
W . Jiang, “Internet traffic prediction with deep neural networks,” Internet T echnology Letters , vol. 5, no. 2, e314, 2022
work page 2022
-
[6]
Principles and algorithms for forecasting groups of time series: Lo- cality and globality,
P . Montero-Manso and R. J. Hyndman, “Principles and algorithms for forecasting groups of time series: Lo- cality and globality,” arXiv preprint arXiv:2008.00444 ,
-
[7]
Available: https://arxiv.org/abs/2008
[Online]. Available: https://arxiv.org/abs/2008. 00444
work page 2008
-
[8]
K. Bandara, C. Bergmeir, and S. Smyl, “Forecast- ing across time series databases using long short-term memory networks on groups of similar series,” CoRR, vol. abs/1710.03222, 2017. [Online]. Available: http : //arxiv.org/abs/1710.03222
-
[9]
Zhang, Abilene network traffic matrices , https://www
Y . Zhang, Abilene network traffic matrices , https://www. cs.utexas.edu/∼ yzhang/research/AbileneTM/, Accessed: Jan. 16, 2025, 2004. 11
work page 2025
-
[10]
Incorporating intra-flow dependencies and inter-flow correlations for traffic matrix prediction,
K. Gao et al., “Incorporating intra-flow dependencies and inter-flow correlations for traffic matrix prediction,” in Proc. IEEE IWQoS , 2020, pp. 1–10
work page 2020
-
[11]
Internet traffic matrix prediction with con- volutional lstm neural network,
W . Jiang, “Internet traffic matrix prediction with con- volutional lstm neural network,” Internet T echnology Letters, vol. 5, e322, 2022. DOI : 10.1002/itl2.322
-
[12]
Improving internet traffic matrix prediction via time series clustering,
M. Cash and A. Wyglinski, “Improving internet traffic matrix prediction via time series clustering,” arXiv preprint arXiv:2509.15072 , 2025, Accepted to Interna- tional Conference on Machine Learning Applications,
-
[13]
Available: https://arxiv.org/abs/2509
[Online]. Available: https://arxiv.org/abs/2509. 15072
-
[14]
Traffic prediction for dy- namic traffic engineering,
T. Otoshi, Y . Ohsita, M. Murata, Y . Takahashi, K. Ishibashi, and K. Shiomoto, “Traffic prediction for dy- namic traffic engineering,” Computer Networks, vol. 85, pp. 36–50, 2015
work page 2015
-
[15]
A. V aladarsky, M. Schapira, D. Shahaf, and A. Tamar, “Learning to route,” in Proc. 16th ACM W orkshop on Hot T opics in Networks , 2017, pp. 185–191
work page 2017
-
[16]
Towards traffic matrix prediction with lstm recurrent neural networks,
J. Zhao, H. Qu, J. Zhao, and D. Jiang, “Towards traffic matrix prediction with lstm recurrent neural networks,” Electronics Letters, vol. 54, no. 9, pp. 566–568, 2018
work page 2018
-
[17]
Deep learning-based traffic prediction for net- work optimization,
S. Troia, R. Alvizu, Y . Zhou, G. Maier, and A. Pat- tavina, “Deep learning-based traffic prediction for net- work optimization,” in Proc. ICTON, 2018, pp. 1–4
work page 2018
-
[18]
Network traffic predic- tion using recurrent neural networks,
N. Ramakrishnan and T. Soni, “Network traffic predic- tion using recurrent neural networks,” in Proc. IEEE ICMLA, 2018, pp. 187–193
work page 2018
-
[19]
An ai-based traffic matrix prediction solution for software- defined network,
D.-H. Le, H.-A. Tran, S. Souihi, and A. Mellouk, “An ai-based traffic matrix prediction solution for software- defined network,” in Proc. IEEE ICC , 2021, pp. 1–6
work page 2021
-
[20]
Flow- by-flow traffic matrix prediction methods: Achieving accurate, adaptable, low cost results,
W . Zheng, Y . Li, M. Hong, X. Fan, and G. Zhao, “Flow- by-flow traffic matrix prediction methods: Achieving accurate, adaptable, low cost results,” Computer Com- munications, vol. 194, pp. 348–360, 2022
work page 2022
-
[21]
Network Traffic Prediction Using PSO-LightGBM- TM,
F. Li, W . Nie, K.-Y . Lam, B. Shen, and X. Li, “Network Traffic Prediction Using PSO-LightGBM- TM,” in IEEE INFOCOM 2024 - IEEE Conference on Computer Communications W orkshops (INFOCOM WKSHPS), May 2024, pp. 1–6. DOI : 10 . 1109 / INFOCOMWKSHPS61880 . 2024 . 10620828 Accessed: Apr. 1, 2025. [Online]. Available: https : / / ieeexplore . ieee.org/document/...
-
[22]
Pre- diction and correction of traffic matrix in an ip backbone network,
W . Liu, A. Hong, L. Ou, W . Ding, and G. Zhang, “Pre- diction and correction of traffic matrix in an ip backbone network,” in Proc. IEEE IPCCC , 2014, pp. 1–9
work page 2014
-
[23]
Traffic matrix prediction based on deep learning for dynamic traffic engineering,
Z. Liu, Z. Wang, X. Yin, X. Shi, Y . Guo, and Y . Tian, “Traffic matrix prediction based on deep learning for dynamic traffic engineering,” in Proc. IEEE ISCC, 2019, pp. 1–7
work page 2019
-
[24]
Time-series clustering - a decade review,
S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, “Time-series clustering - a decade review,” Inf. Syst., vol. 53, no. C, pp. 16–38, Oct. 2015, ISSN : 0306-
work page 2015
-
[25]
Time-series clustering – A decade review , journal =
DOI : 10 . 1016 / j . is . 2015 . 04 . 007 [Online]. Available: https://doi.org/10.1016/j.is.2015.04.007
-
[26]
Shapley values of reconstruction errors of pca for explaining anomaly detection,
R. Ma and R. Angryk, “Distance and density clustering for time series data,” in 2017 IEEE International Con- ference on Data Mining W orkshops (ICDMW) , 2017, pp. 25–32. DOI : 10.1109/ICDMW .2017.11
-
[27]
Characteristic- based clustering for time series data,
X. Wang, K. Smith, and R. Hyndman, “Characteristic- based clustering for time series data,” Data mining and knowledge Discovery, vol. 13, no. 3, pp. 335–364, 2006
work page 2006
-
[28]
Bridging the gap: A decade review of time-series clustering methods,
J. Paparrizos, F. Y ang, and H. Li, “Bridging the gap: A decade review of time-series clustering methods,” arXiv preprint arXiv:2412.20582 , 2024. [Online]. Available: https://arxiv.org/abs/2412.20582
-
[29]
Forecasting histogram time se- ries with k-nearest neighbours methods,
J. Arroyo and C. Mat´ e, “Forecasting histogram time se- ries with k-nearest neighbours methods,” International Journal of F orecasting , vol. 25, no. 1, pp. 192–207, 2009
work page 2009
-
[30]
Histogram-based cluster- ing of multiple data streams,
A. Balzanella and R. V erde, “Histogram-based cluster- ing of multiple data streams,” Knowledge and Informa- tion Systems , vol. 62, no. 1, pp. 203–238, 2020
work page 2020
-
[31]
K. Bandara, C. Bergmeir, and S. Smyl, “Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach,” Expert systems with applications , vol. 140, p. 112 896, 2020
work page 2020
-
[32]
Feature-based classification of time-series data,
A. Nanopoulos, R. Alcock, and Y . Manolopoulos, “Feature-based classification of time-series data,” Inter- national Journal of Computer Research , vol. 10, no. 3, pp. 49–61, 2001
work page 2001
-
[33]
K-shape: Efficient and accurate clustering of time series,
J. Paparrizos and L. Gravano, “K-shape: Efficient and accurate clustering of time series,” in Proceedings of the 2015 ACM SIGMOD international conference on management of data , 2015, pp. 1855–1870
work page 2015
-
[34]
V . Jain, A. Richter, V . Fokow, M. Schweigel, U. Wet- zker, and A. Frotzscher, “A hierarchical feature-based time series clustering approach for data-driven capacity planning of cellular networks,” IEEE Transactions on Machine Learning in Communications and Networking , 2025
work page 2025
-
[35]
Application of agglomerative hi- erarchical clustering for clustering of time series data,
A. Radovanovi´ c, J. Li, J. V . Milanovi´ c, N. Milosavl- jevi´ c, and R. Storchi, “Application of agglomerative hi- erarchical clustering for clustering of time series data,” in 2020 IEEE PES Innovative Smart Grid T echnologies Europe (ISGT-Europe) , 2020, pp. 640–644. DOI : 10 . 1109/ISGT-Europe47291.2020.9248759
-
[36]
A study of hierar- chical clustering algorithms,
S. Patel, S. Sihmar, and A. Jatain, “A study of hierar- chical clustering algorithms,” in 2015 2nd International Conference on Computing for Sustainable Global De- velopment (INDIACom) , 2015, pp. 537–541
work page 2015
-
[37]
The impact of linkage meth- ods in hierarchical clustering for active learning to rank,
Z. Li and M. de Rijke, “The impact of linkage meth- ods in hierarchical clustering for active learning to rank,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’17, Shinjuku, Tokyo, Japan: Association for Computing Machinery, 2017, pp. 941–944, ISBN : 9781450350228. DOI : 10 ....
-
[38]
Time series clustering using fragmented autocorrelations,
A. Albino, J. Caiado, and N. Crato, “Time series clustering using fragmented autocorrelations,” Physica A: Statistical Mechanics and its Applications , vol. 650, p. 129 981, 2024. 12
work page 2024
-
[39]
Robust clustering for time series using spectral densities and functional data analysis,
D. Rivera-Garc´ ıa, L. A. Garc´ ıa-Escudero, A. Mayo- Iscar, and J. Ortega, “Robust clustering for time series using spectral densities and functional data analysis,” in International W ork-Conference on Artificial Neural Networks, Springer, 2017, pp. 142–153
work page 2017
-
[40]
The jensen-shannon divergence,
M. L. Men´ endez, J. A. Pardo, L. Pardo, and M. d. C. Pardo, “The jensen-shannon divergence,” Journal of the Franklin Institute, vol. 334, no. 2, pp. 307–318, 1997
work page 1997
-
[41]
Performance guarantees for hierarchical clustering,
S. Dasgupta and P . M. Long, “Performance guarantees for hierarchical clustering,” Journal of Computer and System Sciences , vol. 70, no. 4, pp. 555–569, 2005
work page 2005
-
[42]
P . Welch, “The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Transactions on Audio and Electroacoustics , vol. 15, no. 2, pp. 70–73, 1967. DOI : 10 . 1109 / TAU . 1967 . 1161901
work page 1967
-
[43]
Providing public intradomain traffic matrices to the research community,
S. Uhlig, B. Quoitin, J. Lepropre, and S. Balon, “Providing public intradomain traffic matrices to the research community,” SIGCOMM Comput. Commun. Rev., vol. 36, no. 1, pp. 83–86, 2006
work page 2006
-
[44]
Prophet: Traffic engineering-centric traffic matrix prediction,
Y . Zhang et al., “Prophet: Traffic engineering-centric traffic matrix prediction,” IEEE/ACM Transactions on Networking, 2023
work page 2023
-
[45]
L. Hubert and P . Arabie, “Comparing partitions,” Jour- nal of classification , vol. 2, no. 1, pp. 193–218, 1985
work page 1985
-
[46]
Cluster ensembles—a knowl- edge reuse framework for combining multiple parti- tions,
A. Strehl and J. Ghosh, “Cluster ensembles—a knowl- edge reuse framework for combining multiple parti- tions,” Journal of machine learning research , vol. 3, no. Dec, pp. 583–617, 2002
work page 2002
-
[47]
Adam: A Method for Stochastic Optimization
D. P . Kingma and J. Ba, “Adam: A method for stochas- tic optimization,” arXiv preprint arXiv:1412.6980 , 2014
work page internal anchor Pith review arXiv 2014
-
[48]
V . Satopaa, J. Albrecht, D. Irwin, and B. Raghavan, “Finding a ´kneedle” in a haystack: Detecting knee points in system behavior,” in 2011 31st international conference on distributed computing systems work- shops, IEEE, 2011, pp. 166–171
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.