On the Role of Time Series Clustering in Traffic Matrix Prediction

Alexander M. Wyglinski; Charlotte Fowler; Martha Cash

arxiv: 2604.26081 · v1 · submitted 2026-04-28 · 💻 cs.NI

On the Role of Time Series Clustering in Traffic Matrix Prediction

Martha Cash , Charlotte Fowler , Alexander M. Wyglinski This is my paper

Pith reviewed 2026-05-07 14:24 UTC · model grok-4.3

classification 💻 cs.NI

keywords time series clusteringtraffic matrix predictionnetwork traffic forecastingAbilene datasetGÉANT datasetforecasting accuracyheterogeneous time series

0 comments

The pith

Clustering traffic flows by their time-series behavior improves traffic matrix prediction accuracy over global models while staying much cheaper than predicting each flow separately.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes grouping network traffic flows into clusters based on shared temporal patterns and training separate predictors for each group rather than one model for the entire traffic matrix or one model per flow. Experiments on the Abilene and GÉANT datasets demonstrate that this approach consistently lowers prediction error relative to a single global forecaster. Most of the accuracy gain appears once a moderate number of clusters is reached, after which adding more clusters yields diminishing returns. Different ways of representing flows for clustering (histograms, autocorrelation, spectral density, or naive splits) produce different groupings yet deliver comparable final error rates. The central benefit therefore stems from breaking the heterogeneous prediction task into smaller, more uniform subproblems.

Core claim

By partitioning the flows of a traffic matrix into clusters according to histogram, autocorrelation function, power spectral density, or naive representations and fitting dedicated predictors inside each cluster, the resulting forecasts achieve lower root mean squared error than a single global model applied to all flows jointly, while incurring substantially lower computational cost than training an independent predictor for every individual flow.

What carries the argument

Clustering-based prediction framework that partitions flows using one of four representations (histogram, ACF, PSD, naive) and trains separate forecasters per cluster.

If this is right

Clustering yields most of its RMSE reduction at moderate cluster counts K, after which further increases produce only small additional gains.
Different clustering representations create dissimilar partitions of the flows yet reach nearly identical overall RMSE values.
The method remains substantially cheaper than fully local per-flow prediction while outperforming the global baseline.
The primary advantage arises from task decomposition rather than from the precise membership of any particular cluster.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition strategy could be tested on other heterogeneous multivariate time-series forecasting problems outside network traffic, such as electricity demand across regions or sensor streams from industrial plants.
If the main benefit is decomposition, then simpler or cheaper partitioning heuristics might substitute for the four examined representations without harming accuracy.
The observed plateau in gains at moderate K suggests an optimal cluster count could be chosen automatically by monitoring validation error rather than by exhaustive search.

Load-bearing premise

Traffic flows inside a single traffic matrix behave heterogeneously enough that a single joint forecaster loses accuracy compared with cluster-specific forecasters.

What would settle it

On the Abilene or GÉANT datasets, a global forecaster that matches or beats the RMSE of every clustered model at the same computational budget.

Figures

Figures reproduced from arXiv: 2604.26081 by Alexander M. Wyglinski, Charlotte Fowler, Martha Cash.

**Figure 1.** Figure 1: Example traffic flows from different source-destinat view at source ↗

**Figure 2.** Figure 2: Overview of the proposed cluster-based traffic matri view at source ↗

**Figure 3.** Figure 3: K-sweep results for each traffic-flow representation. Each panel shows normalized RMSE and runtime as functions of the number of clusters, K. The dotted vertical line indicates the selected value of K used in subsequent experiments. Specific K values are detailed in Table I. and repeat each sweep five times view at source ↗

**Figure 4.** Figure 4: Representative traffic-flow predictions for the Abil view at source ↗

read the original abstract

This paper analyzes the role of time-series clustering in traffic matrix (TM) prediction. Traffic flows within a TM often exhibit heterogeneous behavior, which can reduce the effectiveness of global forecasting models that predict all flows jointly. To address this, we propose a clustering-based prediction framework that groups flows into smaller subsets and trains separate predictors for each group. Four traffic-flow representations for clustering are explored, namely, histogram, autocorrelation function (ACF), power spectral density (PSD), and na\"ive partitioning, and how the representation choice and the number of clusters affect prediction performance. Experiments using the publicly available Abilene and G\'EANT datasets show that clustering consistently improves over global forecasting baselines, while remaining substantially less costly than local prediction. The results further show that most of the performance gain is achieved at moderate values of K, with diminishing returns as the number of clusters increases. Although different clustering representations produce different partitions of the traffic flows, they often achieve similar root mean squared error (RMSE). This suggests that the main benefit of clustering lies in decomposing the TM prediction task into smaller subproblems, while the exact cluster structure plays a more limited role in determining overall prediction accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Clustering gives modest RMSE gains over global TM baselines with diminishing returns past moderate K, but the setup risks test leakage and thin baselines.

read the letter

The main thing to know is that grouping flows by time-series features and training separate predictors beats a single global model on the Abilene and GÉANT traces, yet most of the lift comes early and the four representations end up with similar final error despite different partitions. The benefit looks more like task decomposition than clever cluster structure. That observation is the paper's clearest empirical contribution. It also shows the cost stays well below full per-flow modeling, which matters for operational networks that cannot afford one model per OD pair. The public datasets and the systematic sweep over K and representation choice make the results easy to inspect. Those parts are done cleanly enough to be useful as a practical reference. The authors are right that heterogeneous flow behavior hurts global models and that moderate clustering captures most of the heterogeneity without over-fragmenting the data. That part of the argument holds up on the reported numbers. The soft spots are in the protocol details and the baseline strength. The stress-test concern about leakage is worth checking: if ACF, PSD, or histogram features are built on the full series rather than rolling training windows only, the cluster assignments incorporate test-period statistics and the per-cluster predictors are no longer strictly causal. The abstract and the high-level description do not explicitly rule this out, so the reported gains could be inflated relative to a true forecasting setup. The baselines are also limited to global versus clustered versus local; there is no comparison against stronger modern alternatives such as attention-based or graph neural predictors that already handle heterogeneity without explicit clustering. No error bars or significance tests appear in the summary, which leaves the “consistent improvements” claim harder to weigh. This paper is for network engineers or researchers who need a lightweight way to improve TM forecasts without moving to fully local models. A reader who wants a clear empirical demonstration that representation choice matters less than the act of splitting the problem will find value here. The work is coherent on its own terms and shows honest engagement with the datasets, so it deserves a serious referee once the clustering procedure and statistical reporting are clarified. I would send it to review with a request for those fixes rather than desk-reject.

Referee Report

2 major / 2 minor

Summary. The paper analyzes the role of time series clustering in traffic matrix (TM) prediction. It proposes a framework that groups flows into clusters using four representations (histogram, autocorrelation function, power spectral density, and naïve partitioning), then trains separate predictors per cluster. Experiments on the Abilene and GÉANT datasets claim that clustering yields consistent RMSE improvements over global forecasting baselines at substantially lower cost than local per-flow prediction, with most gains achieved at moderate values of K and diminishing returns thereafter; different representations produce different partitions but often similar RMSE.

Significance. If the experimental protocol is free of data leakage, the results would indicate that clustering provides a practical middle ground for TM prediction by decomposing heterogeneous flows into smaller subproblems, delivering accuracy gains over global models without the full overhead of local models. The finding that representation choice has limited impact on final RMSE would further suggest that the benefit is primarily from task decomposition rather than cluster structure.

major comments (2)

[Experimental protocol / Results] The experimental protocol (described in the methods and results sections) does not explicitly state that clustering features (ACF, PSD, histograms) are computed only on training windows. If these representations are derived from the full time series, cluster assignments encode test-period statistics, introducing leakage that turns the per-cluster predictors into partially supervised models and inflates the reported RMSE gains relative to true causal baselines.
[Results] The claim of 'consistent improvements' over global baselines (abstract and §5) is not accompanied by details on the exact baseline models, train/test split ratios, presence of error bars, or statistical significance tests on RMSE differences. This prevents verification of whether the gains are robust or merely within noise.

minor comments (2)

[Abstract] The abstract refers to 'global forecasting baselines' without naming the models or providing a brief citation; adding this would improve readability.
[Results] A summary table of RMSE values across K and representations (currently described only in text) would allow readers to directly compare the diminishing-returns observation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments identify key areas where the experimental protocol and results presentation can be strengthened for clarity and rigor. We address each major comment below and indicate the revisions that will be incorporated into the next version of the manuscript.

read point-by-point responses

Referee: The experimental protocol (described in the methods and results sections) does not explicitly state that clustering features (ACF, PSD, histograms) are computed only on training windows. If these representations are derived from the full time series, cluster assignments encode test-period statistics, introducing leakage that turns the per-cluster predictors into partially supervised models and inflates the reported RMSE gains relative to true causal baselines.

Authors: We appreciate this observation. In the experiments, all clustering representations (histogram, ACF, and PSD) were computed exclusively on the training windows of each time series, with cluster assignments determined prior to predictor training and held fixed during evaluation. This ensures no information from the test period influences the clustering or subsequent predictions. However, we acknowledge that the manuscript does not state this explicitly. We will revise the methods section to include a clear statement that feature extraction for clustering is performed solely on training data, along with pseudocode or a diagram illustrating the temporal separation. revision: yes
Referee: The claim of 'consistent improvements' over global baselines (abstract and §5) is not accompanied by details on the exact baseline models, train/test split ratios, presence of error bars, or statistical significance tests on RMSE differences. This prevents verification of whether the gains are robust or merely within noise.

Authors: We agree that these details are necessary for full reproducibility and assessment of robustness. The global baselines consist of the identical forecasting architectures (ARIMA, Prophet, LSTM, and Transformer variants) trained jointly on all flows without clustering. The evaluation uses a 70/30 chronological train/test split with a rolling-window protocol (window size 1000 time steps, stride 100). We will augment §5 with a table reporting mean RMSE, standard deviation across 5 independent runs (different random seeds for model initialization and data shuffling within training), and p-values from paired Wilcoxon signed-rank tests comparing clustered vs. global RMSE per dataset and representation. These additions will also be reflected in the abstract and a new subsection on statistical analysis. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation only

full rationale

The paper reports an empirical comparison of clustering-based TM predictors against global baselines on the Abilene and GÉANT datasets. No mathematical derivation, equation, or first-principles result is presented that reduces the reported RMSE gains to a fitted parameter, self-referential quantity, or self-citation chain. Cluster assignments and per-cluster models are trained and evaluated as separate steps on real data; the performance differences are measured outcomes rather than identities by construction. Potential issues such as feature computation on full series constitute a methodological validity concern but do not match any enumerated circularity pattern and do not force the central claim to equal its inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that traffic flows are heterogeneous enough to benefit from decomposition and on standard time-series feature extraction techniques; no new entities are postulated and the only tunable element is the number of clusters K, which is varied experimentally rather than fitted to produce the result.

free parameters (1)

number of clusters K
Varied across experiments to measure performance; not a single fitted constant but an explicit hyperparameter whose effect is reported.

axioms (1)

domain assumption Traffic flows within a traffic matrix exhibit heterogeneous behavior that reduces the effectiveness of a single global forecasting model
Stated directly in the abstract as the motivation for clustering.

pith-pipeline@v0.9.0 · 5503 in / 1352 out tokens · 77044 ms · 2026-05-07T14:24:03.714983+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

[1]

Internet trafﬁc matrices: A primer,

P . Tune, M. Roughan, H. Haddadi, and O. Bonaventure, “Internet trafﬁc matrices: A primer,” Recent Advances in Networking , vol. 1, pp. 1–56, 2013

work page 2013
[2]

Trafﬁc matrices: Balancing measurements, inference and modeling,

A. S. et al., “Trafﬁc matrices: Balancing measurements, inference and modeling,” SIGMETRICS Performance Evaluation Review , vol. 33, no. 1, pp. 362–373, 2005

work page 2005
[3]

Neutm: A neural network- based framework for trafﬁc matrix prediction in sdn,

A. Azzouni and G. Pujolle, “Neutm: A neural network- based framework for trafﬁc matrix prediction in sdn,” in Proc. IEEE/IFIP NOMS , 2018, pp. 1–5

work page 2018
[4]

Deep learn- ing based network trafﬁc matrix prediction,

D. Aloraifan, I. Ahmad, and E. Alrashed, “Deep learn- ing based network trafﬁc matrix prediction,” Interna- tional Journal of Intelligent Networks , vol. 2, pp. 46–56, 2021

work page 2021
[5]

Internet trafﬁc prediction with deep neural networks,

W . Jiang, “Internet trafﬁc prediction with deep neural networks,” Internet T echnology Letters , vol. 5, no. 2, e314, 2022

work page 2022
[6]

Principles and algorithms for forecasting groups of time series: Lo- cality and globality,

P . Montero-Manso and R. J. Hyndman, “Principles and algorithms for forecasting groups of time series: Lo- cality and globality,” arXiv preprint arXiv:2008.00444 ,

work page arXiv 2008
[7]

Available: https://arxiv.org/abs/2008

[Online]. Available: https://arxiv.org/abs/2008. 00444

work page 2008
[8]

Forecast- ing across time series databases using long short-term memory networks on groups of similar series,

K. Bandara, C. Bergmeir, and S. Smyl, “Forecast- ing across time series databases using long short-term memory networks on groups of similar series,” CoRR, vol. abs/1710.03222, 2017. [Online]. Available: http : //arxiv.org/abs/1710.03222

work page arXiv 2017
[9]

Zhang, Abilene network trafﬁc matrices , https://www

Y . Zhang, Abilene network trafﬁc matrices , https://www. cs.utexas.edu/∼ yzhang/research/AbileneTM/, Accessed: Jan. 16, 2025, 2004. 11

work page 2025
[10]

Incorporating intra-ﬂow dependencies and inter-ﬂow correlations for trafﬁc matrix prediction,

K. Gao et al., “Incorporating intra-ﬂow dependencies and inter-ﬂow correlations for trafﬁc matrix prediction,” in Proc. IEEE IWQoS , 2020, pp. 1–10

work page 2020
[11]

Internet trafﬁc matrix prediction with con- volutional lstm neural network,

W . Jiang, “Internet trafﬁc matrix prediction with con- volutional lstm neural network,” Internet T echnology Letters, vol. 5, e322, 2022. DOI : 10.1002/itl2.322

work page doi:10.1002/itl2.322 2022
[12]

Improving internet trafﬁc matrix prediction via time series clustering,

M. Cash and A. Wyglinski, “Improving internet trafﬁc matrix prediction via time series clustering,” arXiv preprint arXiv:2509.15072 , 2025, Accepted to Interna- tional Conference on Machine Learning Applications,

work page arXiv 2025
[13]

Available: https://arxiv.org/abs/2509

[Online]. Available: https://arxiv.org/abs/2509. 15072

work page
[14]

Trafﬁc prediction for dy- namic trafﬁc engineering,

T. Otoshi, Y . Ohsita, M. Murata, Y . Takahashi, K. Ishibashi, and K. Shiomoto, “Trafﬁc prediction for dy- namic trafﬁc engineering,” Computer Networks, vol. 85, pp. 36–50, 2015

work page 2015
[15]

Learning to route,

A. V aladarsky, M. Schapira, D. Shahaf, and A. Tamar, “Learning to route,” in Proc. 16th ACM W orkshop on Hot T opics in Networks , 2017, pp. 185–191

work page 2017
[16]

Towards trafﬁc matrix prediction with lstm recurrent neural networks,

J. Zhao, H. Qu, J. Zhao, and D. Jiang, “Towards trafﬁc matrix prediction with lstm recurrent neural networks,” Electronics Letters, vol. 54, no. 9, pp. 566–568, 2018

work page 2018
[17]

Deep learning-based trafﬁc prediction for net- work optimization,

S. Troia, R. Alvizu, Y . Zhou, G. Maier, and A. Pat- tavina, “Deep learning-based trafﬁc prediction for net- work optimization,” in Proc. ICTON, 2018, pp. 1–4

work page 2018
[18]

Network trafﬁc predic- tion using recurrent neural networks,

N. Ramakrishnan and T. Soni, “Network trafﬁc predic- tion using recurrent neural networks,” in Proc. IEEE ICMLA, 2018, pp. 187–193

work page 2018
[19]

An ai-based trafﬁc matrix prediction solution for software- deﬁned network,

D.-H. Le, H.-A. Tran, S. Souihi, and A. Mellouk, “An ai-based trafﬁc matrix prediction solution for software- deﬁned network,” in Proc. IEEE ICC , 2021, pp. 1–6

work page 2021
[20]

Flow- by-ﬂow trafﬁc matrix prediction methods: Achieving accurate, adaptable, low cost results,

W . Zheng, Y . Li, M. Hong, X. Fan, and G. Zhao, “Flow- by-ﬂow trafﬁc matrix prediction methods: Achieving accurate, adaptable, low cost results,” Computer Com- munications, vol. 194, pp. 348–360, 2022

work page 2022
[21]

Network Trafﬁc Prediction Using PSO-LightGBM- TM,

F. Li, W . Nie, K.-Y . Lam, B. Shen, and X. Li, “Network Trafﬁc Prediction Using PSO-LightGBM- TM,” in IEEE INFOCOM 2024 - IEEE Conference on Computer Communications W orkshops (INFOCOM WKSHPS), May 2024, pp. 1–6. DOI : 10 . 1109 / INFOCOMWKSHPS61880 . 2024 . 10620828 Accessed: Apr. 1, 2025. [Online]. Available: https : / / ieeexplore . ieee.org/document/...

work page arXiv 2024
[22]

Pre- diction and correction of trafﬁc matrix in an ip backbone network,

W . Liu, A. Hong, L. Ou, W . Ding, and G. Zhang, “Pre- diction and correction of trafﬁc matrix in an ip backbone network,” in Proc. IEEE IPCCC , 2014, pp. 1–9

work page 2014
[23]

Trafﬁc matrix prediction based on deep learning for dynamic trafﬁc engineering,

Z. Liu, Z. Wang, X. Yin, X. Shi, Y . Guo, and Y . Tian, “Trafﬁc matrix prediction based on deep learning for dynamic trafﬁc engineering,” in Proc. IEEE ISCC, 2019, pp. 1–7

work page 2019
[24]

Time-series clustering - a decade review,

S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, “Time-series clustering - a decade review,” Inf. Syst., vol. 53, no. C, pp. 16–38, Oct. 2015, ISSN : 0306-

work page 2015
[25]

Time-series clustering – A decade review , journal =

DOI : 10 . 1016 / j . is . 2015 . 04 . 007 [Online]. Available: https://doi.org/10.1016/j.is.2015.04.007

work page doi:10.1016/j.is.2015.04.007 2015
[26]

Shapley values of reconstruction errors of pca for explaining anomaly detection,

R. Ma and R. Angryk, “Distance and density clustering for time series data,” in 2017 IEEE International Con- ference on Data Mining W orkshops (ICDMW) , 2017, pp. 25–32. DOI : 10.1109/ICDMW .2017.11

work page doi:10.1109/icdmw 2017
[27]

Characteristic- based clustering for time series data,

X. Wang, K. Smith, and R. Hyndman, “Characteristic- based clustering for time series data,” Data mining and knowledge Discovery, vol. 13, no. 3, pp. 335–364, 2006

work page 2006
[28]

Bridging the gap: A decade review of time-series clustering methods,

J. Paparrizos, F. Y ang, and H. Li, “Bridging the gap: A decade review of time-series clustering methods,” arXiv preprint arXiv:2412.20582 , 2024. [Online]. Available: https://arxiv.org/abs/2412.20582

work page arXiv 2024
[29]

Forecasting histogram time se- ries with k-nearest neighbours methods,

J. Arroyo and C. Mat´ e, “Forecasting histogram time se- ries with k-nearest neighbours methods,” International Journal of F orecasting , vol. 25, no. 1, pp. 192–207, 2009

work page 2009
[30]

Histogram-based cluster- ing of multiple data streams,

A. Balzanella and R. V erde, “Histogram-based cluster- ing of multiple data streams,” Knowledge and Informa- tion Systems , vol. 62, no. 1, pp. 203–238, 2020

work page 2020
[31]

Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach,

K. Bandara, C. Bergmeir, and S. Smyl, “Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach,” Expert systems with applications , vol. 140, p. 112 896, 2020

work page 2020
[32]

Feature-based classiﬁcation of time-series data,

A. Nanopoulos, R. Alcock, and Y . Manolopoulos, “Feature-based classiﬁcation of time-series data,” Inter- national Journal of Computer Research , vol. 10, no. 3, pp. 49–61, 2001

work page 2001
[33]

K-shape: Efﬁcient and accurate clustering of time series,

J. Paparrizos and L. Gravano, “K-shape: Efﬁcient and accurate clustering of time series,” in Proceedings of the 2015 ACM SIGMOD international conference on management of data , 2015, pp. 1855–1870

work page 2015
[34]

A hierarchical feature-based time series clustering approach for data-driven capacity planning of cellular networks,

V . Jain, A. Richter, V . Fokow, M. Schweigel, U. Wet- zker, and A. Frotzscher, “A hierarchical feature-based time series clustering approach for data-driven capacity planning of cellular networks,” IEEE Transactions on Machine Learning in Communications and Networking , 2025

work page 2025
[35]

Application of agglomerative hi- erarchical clustering for clustering of time series data,

A. Radovanovi´ c, J. Li, J. V . Milanovi´ c, N. Milosavl- jevi´ c, and R. Storchi, “Application of agglomerative hi- erarchical clustering for clustering of time series data,” in 2020 IEEE PES Innovative Smart Grid T echnologies Europe (ISGT-Europe) , 2020, pp. 640–644. DOI : 10 . 1109/ISGT-Europe47291.2020.9248759

work page arXiv 2020
[36]

A study of hierar- chical clustering algorithms,

S. Patel, S. Sihmar, and A. Jatain, “A study of hierar- chical clustering algorithms,” in 2015 2nd International Conference on Computing for Sustainable Global De- velopment (INDIACom) , 2015, pp. 537–541

work page 2015
[37]

The impact of linkage meth- ods in hierarchical clustering for active learning to rank,

Z. Li and M. de Rijke, “The impact of linkage meth- ods in hierarchical clustering for active learning to rank,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’17, Shinjuku, Tokyo, Japan: Association for Computing Machinery, 2017, pp. 941–944, ISBN : 9781450350228. DOI : 10 ....

work page doi:10.1145/3077136.3080684 2017
[38]

Time series clustering using fragmented autocorrelations,

A. Albino, J. Caiado, and N. Crato, “Time series clustering using fragmented autocorrelations,” Physica A: Statistical Mechanics and its Applications , vol. 650, p. 129 981, 2024. 12

work page 2024
[39]

Robust clustering for time series using spectral densities and functional data analysis,

D. Rivera-Garc´ ıa, L. A. Garc´ ıa-Escudero, A. Mayo- Iscar, and J. Ortega, “Robust clustering for time series using spectral densities and functional data analysis,” in International W ork-Conference on Artiﬁcial Neural Networks, Springer, 2017, pp. 142–153

work page 2017
[40]

The jensen-shannon divergence,

M. L. Men´ endez, J. A. Pardo, L. Pardo, and M. d. C. Pardo, “The jensen-shannon divergence,” Journal of the Franklin Institute, vol. 334, no. 2, pp. 307–318, 1997

work page 1997
[41]

Performance guarantees for hierarchical clustering,

S. Dasgupta and P . M. Long, “Performance guarantees for hierarchical clustering,” Journal of Computer and System Sciences , vol. 70, no. 4, pp. 555–569, 2005

work page 2005
[42]

The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modiﬁed periodograms,

P . Welch, “The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modiﬁed periodograms,” IEEE Transactions on Audio and Electroacoustics , vol. 15, no. 2, pp. 70–73, 1967. DOI : 10 . 1109 / TAU . 1967 . 1161901

work page 1967
[43]

Providing public intradomain trafﬁc matrices to the research community,

S. Uhlig, B. Quoitin, J. Lepropre, and S. Balon, “Providing public intradomain trafﬁc matrices to the research community,” SIGCOMM Comput. Commun. Rev., vol. 36, no. 1, pp. 83–86, 2006

work page 2006
[44]

Prophet: Trafﬁc engineering-centric trafﬁc matrix prediction,

Y . Zhang et al., “Prophet: Trafﬁc engineering-centric trafﬁc matrix prediction,” IEEE/ACM Transactions on Networking, 2023

work page 2023
[45]

Comparing partitions,

L. Hubert and P . Arabie, “Comparing partitions,” Jour- nal of classiﬁcation , vol. 2, no. 1, pp. 193–218, 1985

work page 1985
[46]

Cluster ensembles—a knowl- edge reuse framework for combining multiple parti- tions,

A. Strehl and J. Ghosh, “Cluster ensembles—a knowl- edge reuse framework for combining multiple parti- tions,” Journal of machine learning research , vol. 3, no. Dec, pp. 583–617, 2002

work page 2002
[47]

Adam: A Method for Stochastic Optimization

D. P . Kingma and J. Ba, “Adam: A method for stochas- tic optimization,” arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review arXiv 2014
[48]

Finding a ´kneedle

V . Satopaa, J. Albrecht, D. Irwin, and B. Raghavan, “Finding a ´kneedle” in a haystack: Detecting knee points in system behavior,” in 2011 31st international conference on distributed computing systems work- shops, IEEE, 2011, pp. 166–171

work page 2011

[1] [1]

Internet trafﬁc matrices: A primer,

P . Tune, M. Roughan, H. Haddadi, and O. Bonaventure, “Internet trafﬁc matrices: A primer,” Recent Advances in Networking , vol. 1, pp. 1–56, 2013

work page 2013

[2] [2]

Trafﬁc matrices: Balancing measurements, inference and modeling,

A. S. et al., “Trafﬁc matrices: Balancing measurements, inference and modeling,” SIGMETRICS Performance Evaluation Review , vol. 33, no. 1, pp. 362–373, 2005

work page 2005

[3] [3]

Neutm: A neural network- based framework for trafﬁc matrix prediction in sdn,

A. Azzouni and G. Pujolle, “Neutm: A neural network- based framework for trafﬁc matrix prediction in sdn,” in Proc. IEEE/IFIP NOMS , 2018, pp. 1–5

work page 2018

[4] [4]

Deep learn- ing based network trafﬁc matrix prediction,

D. Aloraifan, I. Ahmad, and E. Alrashed, “Deep learn- ing based network trafﬁc matrix prediction,” Interna- tional Journal of Intelligent Networks , vol. 2, pp. 46–56, 2021

work page 2021

[5] [5]

Internet trafﬁc prediction with deep neural networks,

W . Jiang, “Internet trafﬁc prediction with deep neural networks,” Internet T echnology Letters , vol. 5, no. 2, e314, 2022

work page 2022

[6] [6]

Principles and algorithms for forecasting groups of time series: Lo- cality and globality,

P . Montero-Manso and R. J. Hyndman, “Principles and algorithms for forecasting groups of time series: Lo- cality and globality,” arXiv preprint arXiv:2008.00444 ,

work page arXiv 2008

[7] [7]

Available: https://arxiv.org/abs/2008

[Online]. Available: https://arxiv.org/abs/2008. 00444

work page 2008

[8] [8]

Forecast- ing across time series databases using long short-term memory networks on groups of similar series,

K. Bandara, C. Bergmeir, and S. Smyl, “Forecast- ing across time series databases using long short-term memory networks on groups of similar series,” CoRR, vol. abs/1710.03222, 2017. [Online]. Available: http : //arxiv.org/abs/1710.03222

work page arXiv 2017

[9] [9]

Zhang, Abilene network trafﬁc matrices , https://www

Y . Zhang, Abilene network trafﬁc matrices , https://www. cs.utexas.edu/∼ yzhang/research/AbileneTM/, Accessed: Jan. 16, 2025, 2004. 11

work page 2025

[10] [10]

Incorporating intra-ﬂow dependencies and inter-ﬂow correlations for trafﬁc matrix prediction,

K. Gao et al., “Incorporating intra-ﬂow dependencies and inter-ﬂow correlations for trafﬁc matrix prediction,” in Proc. IEEE IWQoS , 2020, pp. 1–10

work page 2020

[11] [11]

Internet trafﬁc matrix prediction with con- volutional lstm neural network,

W . Jiang, “Internet trafﬁc matrix prediction with con- volutional lstm neural network,” Internet T echnology Letters, vol. 5, e322, 2022. DOI : 10.1002/itl2.322

work page doi:10.1002/itl2.322 2022

[12] [12]

Improving internet trafﬁc matrix prediction via time series clustering,

M. Cash and A. Wyglinski, “Improving internet trafﬁc matrix prediction via time series clustering,” arXiv preprint arXiv:2509.15072 , 2025, Accepted to Interna- tional Conference on Machine Learning Applications,

work page arXiv 2025

[13] [13]

Available: https://arxiv.org/abs/2509

[Online]. Available: https://arxiv.org/abs/2509. 15072

work page

[14] [14]

Trafﬁc prediction for dy- namic trafﬁc engineering,

T. Otoshi, Y . Ohsita, M. Murata, Y . Takahashi, K. Ishibashi, and K. Shiomoto, “Trafﬁc prediction for dy- namic trafﬁc engineering,” Computer Networks, vol. 85, pp. 36–50, 2015

work page 2015

[15] [15]

Learning to route,

A. V aladarsky, M. Schapira, D. Shahaf, and A. Tamar, “Learning to route,” in Proc. 16th ACM W orkshop on Hot T opics in Networks , 2017, pp. 185–191

work page 2017

[16] [16]

Towards trafﬁc matrix prediction with lstm recurrent neural networks,

J. Zhao, H. Qu, J. Zhao, and D. Jiang, “Towards trafﬁc matrix prediction with lstm recurrent neural networks,” Electronics Letters, vol. 54, no. 9, pp. 566–568, 2018

work page 2018

[17] [17]

Deep learning-based trafﬁc prediction for net- work optimization,

S. Troia, R. Alvizu, Y . Zhou, G. Maier, and A. Pat- tavina, “Deep learning-based trafﬁc prediction for net- work optimization,” in Proc. ICTON, 2018, pp. 1–4

work page 2018

[18] [18]

Network trafﬁc predic- tion using recurrent neural networks,

N. Ramakrishnan and T. Soni, “Network trafﬁc predic- tion using recurrent neural networks,” in Proc. IEEE ICMLA, 2018, pp. 187–193

work page 2018

[19] [19]

An ai-based trafﬁc matrix prediction solution for software- deﬁned network,

D.-H. Le, H.-A. Tran, S. Souihi, and A. Mellouk, “An ai-based trafﬁc matrix prediction solution for software- deﬁned network,” in Proc. IEEE ICC , 2021, pp. 1–6

work page 2021

[20] [20]

Flow- by-ﬂow trafﬁc matrix prediction methods: Achieving accurate, adaptable, low cost results,

W . Zheng, Y . Li, M. Hong, X. Fan, and G. Zhao, “Flow- by-ﬂow trafﬁc matrix prediction methods: Achieving accurate, adaptable, low cost results,” Computer Com- munications, vol. 194, pp. 348–360, 2022

work page 2022

[21] [21]

Network Trafﬁc Prediction Using PSO-LightGBM- TM,

F. Li, W . Nie, K.-Y . Lam, B. Shen, and X. Li, “Network Trafﬁc Prediction Using PSO-LightGBM- TM,” in IEEE INFOCOM 2024 - IEEE Conference on Computer Communications W orkshops (INFOCOM WKSHPS), May 2024, pp. 1–6. DOI : 10 . 1109 / INFOCOMWKSHPS61880 . 2024 . 10620828 Accessed: Apr. 1, 2025. [Online]. Available: https : / / ieeexplore . ieee.org/document/...

work page arXiv 2024

[22] [22]

Pre- diction and correction of trafﬁc matrix in an ip backbone network,

W . Liu, A. Hong, L. Ou, W . Ding, and G. Zhang, “Pre- diction and correction of trafﬁc matrix in an ip backbone network,” in Proc. IEEE IPCCC , 2014, pp. 1–9

work page 2014

[23] [23]

Trafﬁc matrix prediction based on deep learning for dynamic trafﬁc engineering,

Z. Liu, Z. Wang, X. Yin, X. Shi, Y . Guo, and Y . Tian, “Trafﬁc matrix prediction based on deep learning for dynamic trafﬁc engineering,” in Proc. IEEE ISCC, 2019, pp. 1–7

work page 2019

[24] [24]

Time-series clustering - a decade review,

S. Aghabozorgi, A. Seyed Shirkhorshidi, and T. Ying Wah, “Time-series clustering - a decade review,” Inf. Syst., vol. 53, no. C, pp. 16–38, Oct. 2015, ISSN : 0306-

work page 2015

[25] [25]

Time-series clustering – A decade review , journal =

DOI : 10 . 1016 / j . is . 2015 . 04 . 007 [Online]. Available: https://doi.org/10.1016/j.is.2015.04.007

work page doi:10.1016/j.is.2015.04.007 2015

[26] [26]

Shapley values of reconstruction errors of pca for explaining anomaly detection,

R. Ma and R. Angryk, “Distance and density clustering for time series data,” in 2017 IEEE International Con- ference on Data Mining W orkshops (ICDMW) , 2017, pp. 25–32. DOI : 10.1109/ICDMW .2017.11

work page doi:10.1109/icdmw 2017

[27] [27]

Characteristic- based clustering for time series data,

X. Wang, K. Smith, and R. Hyndman, “Characteristic- based clustering for time series data,” Data mining and knowledge Discovery, vol. 13, no. 3, pp. 335–364, 2006

work page 2006

[28] [28]

Bridging the gap: A decade review of time-series clustering methods,

J. Paparrizos, F. Y ang, and H. Li, “Bridging the gap: A decade review of time-series clustering methods,” arXiv preprint arXiv:2412.20582 , 2024. [Online]. Available: https://arxiv.org/abs/2412.20582

work page arXiv 2024

[29] [29]

Forecasting histogram time se- ries with k-nearest neighbours methods,

J. Arroyo and C. Mat´ e, “Forecasting histogram time se- ries with k-nearest neighbours methods,” International Journal of F orecasting , vol. 25, no. 1, pp. 192–207, 2009

work page 2009

[30] [30]

Histogram-based cluster- ing of multiple data streams,

A. Balzanella and R. V erde, “Histogram-based cluster- ing of multiple data streams,” Knowledge and Informa- tion Systems , vol. 62, no. 1, pp. 203–238, 2020

work page 2020

[31] [31]

Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach,

K. Bandara, C. Bergmeir, and S. Smyl, “Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach,” Expert systems with applications , vol. 140, p. 112 896, 2020

work page 2020

[32] [32]

Feature-based classiﬁcation of time-series data,

A. Nanopoulos, R. Alcock, and Y . Manolopoulos, “Feature-based classiﬁcation of time-series data,” Inter- national Journal of Computer Research , vol. 10, no. 3, pp. 49–61, 2001

work page 2001

[33] [33]

K-shape: Efﬁcient and accurate clustering of time series,

J. Paparrizos and L. Gravano, “K-shape: Efﬁcient and accurate clustering of time series,” in Proceedings of the 2015 ACM SIGMOD international conference on management of data , 2015, pp. 1855–1870

work page 2015

[34] [34]

A hierarchical feature-based time series clustering approach for data-driven capacity planning of cellular networks,

V . Jain, A. Richter, V . Fokow, M. Schweigel, U. Wet- zker, and A. Frotzscher, “A hierarchical feature-based time series clustering approach for data-driven capacity planning of cellular networks,” IEEE Transactions on Machine Learning in Communications and Networking , 2025

work page 2025

[35] [35]

Application of agglomerative hi- erarchical clustering for clustering of time series data,

A. Radovanovi´ c, J. Li, J. V . Milanovi´ c, N. Milosavl- jevi´ c, and R. Storchi, “Application of agglomerative hi- erarchical clustering for clustering of time series data,” in 2020 IEEE PES Innovative Smart Grid T echnologies Europe (ISGT-Europe) , 2020, pp. 640–644. DOI : 10 . 1109/ISGT-Europe47291.2020.9248759

work page arXiv 2020

[36] [36]

A study of hierar- chical clustering algorithms,

S. Patel, S. Sihmar, and A. Jatain, “A study of hierar- chical clustering algorithms,” in 2015 2nd International Conference on Computing for Sustainable Global De- velopment (INDIACom) , 2015, pp. 537–541

work page 2015

[37] [37]

The impact of linkage meth- ods in hierarchical clustering for active learning to rank,

Z. Li and M. de Rijke, “The impact of linkage meth- ods in hierarchical clustering for active learning to rank,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’17, Shinjuku, Tokyo, Japan: Association for Computing Machinery, 2017, pp. 941–944, ISBN : 9781450350228. DOI : 10 ....

work page doi:10.1145/3077136.3080684 2017

[38] [38]

Time series clustering using fragmented autocorrelations,

A. Albino, J. Caiado, and N. Crato, “Time series clustering using fragmented autocorrelations,” Physica A: Statistical Mechanics and its Applications , vol. 650, p. 129 981, 2024. 12

work page 2024

[39] [39]

Robust clustering for time series using spectral densities and functional data analysis,

D. Rivera-Garc´ ıa, L. A. Garc´ ıa-Escudero, A. Mayo- Iscar, and J. Ortega, “Robust clustering for time series using spectral densities and functional data analysis,” in International W ork-Conference on Artiﬁcial Neural Networks, Springer, 2017, pp. 142–153

work page 2017

[40] [40]

The jensen-shannon divergence,

M. L. Men´ endez, J. A. Pardo, L. Pardo, and M. d. C. Pardo, “The jensen-shannon divergence,” Journal of the Franklin Institute, vol. 334, no. 2, pp. 307–318, 1997

work page 1997

[41] [41]

Performance guarantees for hierarchical clustering,

S. Dasgupta and P . M. Long, “Performance guarantees for hierarchical clustering,” Journal of Computer and System Sciences , vol. 70, no. 4, pp. 555–569, 2005

work page 2005

[42] [42]

The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modiﬁed periodograms,

P . Welch, “The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modiﬁed periodograms,” IEEE Transactions on Audio and Electroacoustics , vol. 15, no. 2, pp. 70–73, 1967. DOI : 10 . 1109 / TAU . 1967 . 1161901

work page 1967

[43] [43]

Providing public intradomain trafﬁc matrices to the research community,

S. Uhlig, B. Quoitin, J. Lepropre, and S. Balon, “Providing public intradomain trafﬁc matrices to the research community,” SIGCOMM Comput. Commun. Rev., vol. 36, no. 1, pp. 83–86, 2006

work page 2006

[44] [44]

Prophet: Trafﬁc engineering-centric trafﬁc matrix prediction,

Y . Zhang et al., “Prophet: Trafﬁc engineering-centric trafﬁc matrix prediction,” IEEE/ACM Transactions on Networking, 2023

work page 2023

[45] [45]

Comparing partitions,

L. Hubert and P . Arabie, “Comparing partitions,” Jour- nal of classiﬁcation , vol. 2, no. 1, pp. 193–218, 1985

work page 1985

[46] [46]

Cluster ensembles—a knowl- edge reuse framework for combining multiple parti- tions,

A. Strehl and J. Ghosh, “Cluster ensembles—a knowl- edge reuse framework for combining multiple parti- tions,” Journal of machine learning research , vol. 3, no. Dec, pp. 583–617, 2002

work page 2002

[47] [47]

Adam: A Method for Stochastic Optimization

D. P . Kingma and J. Ba, “Adam: A method for stochas- tic optimization,” arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review arXiv 2014

[48] [48]

Finding a ´kneedle

V . Satopaa, J. Albrecht, D. Irwin, and B. Raghavan, “Finding a ´kneedle” in a haystack: Detecting knee points in system behavior,” in 2011 31st international conference on distributed computing systems work- shops, IEEE, 2011, pp. 166–171

work page 2011