pith. sign in

arxiv: 2510.06063 · v2 · pith:KHIPXZVFnew · submitted 2025-10-07 · 💻 cs.AI · cs.IT· cs.LG· math.IT

TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis

Pith reviewed 2026-05-21 20:19 UTC · model grok-4.3

classification 💻 cs.AI cs.ITcs.LGmath.IT
keywords observability datatime series5G networkanomaly detectionroot cause analysismulti-modal modelsfoundation modelsbenchmark dataset
0
0 comments X

The pith

A new 5G observability dataset shows that current time series and multi-modal models struggle with abrupt, noisy, high-variance dynamics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TelecomTS, a large-scale dataset from a real 5G telecommunications network that supplies heterogeneous time series metrics with preserved absolute scale information rather than normalized or anonymized values. This data exhibits zero-inflation, high stochasticity, and minimal temporal structure unlike smoother domains such as climate or finance. Benchmarking state-of-the-art time series, language, reasoning, and multi-modal models on tasks including anomaly detection, root cause analysis, and multi-modal question-answering demonstrates that existing approaches have difficulty with the abrupt changes and noise typical of system monitoring. The work matters because observability data underpins enterprise system reliability, and the findings point toward the need for models that can directly use raw scale information in practical applications.

Core claim

TelecomTS is a heterogeneous, de-anonymized observability dataset from a 5G network that retains explicit absolute scale in covariates and supports downstream tasks such as anomaly detection, root cause analysis, and multi-modal question-answering; evaluations show that existing time series, language, reasoning, and multi-modal foundation models struggle with its abrupt, noisy, and high-variance dynamics, underscoring the importance of preserving and natively leveraging scale information.

What carries the argument

The TelecomTS dataset supplying raw-scale 5G network metrics and associated multi-modal tasks that expose model limitations on stochastic observability data.

If this is right

  • Foundation time series models should be designed to accept and use absolute scale information in covariates rather than assuming normalized inputs.
  • Approaches trained primarily on low-variance domains will likely underperform on high-stochasticity monitoring data without adaptation.
  • Multi-modal models can now be directly compared on root cause analysis and question-answering using paired time series and textual descriptions from the same system.
  • Public benchmarks for observability applications must retain raw scale values to remain representative of production environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Comparable datasets from cloud or IoT monitoring could reveal whether the observed model limitations generalize beyond telecommunications.
  • Pretraining objectives that explicitly model zero-inflation and abrupt shifts might yield more robust observability-specific models.
  • Routine preservation of absolute scale during data collection could become standard practice for time series applications in other noisy domains.

Load-bearing premise

The dataset drawn from one 5G network is representative of general enterprise observability data and the selected tasks reflect authentic real-world challenges without selection bias.

What would settle it

A replication showing that current models reach high accuracy on TelecomTS tasks after standard fine-tuning, or that the same performance gap does not appear on other observability datasets, would challenge the central claim.

Figures

Figures reproduced from arXiv: 2510.06063 by Ali Maatouk, Andreas Varvarigos, Austin Feng, Daniela Fernandez, Ioannis Panitsas, Jialin Chen, Jinbiao Wei, Leandros Tassiulas, Rex Ying, Yuwei Guo.

Figure 1
Figure 1. Figure 1: An overview of TelecomTS, illustrating its data curation pipeline, covariate characteristics, and the range of supported multi-modal downstream tasks. (1) the lack of publicly available datasets due to the proprietary nature of observability data, (2) anonymization in the few existing datasets, which obscures both the identity of the metrics and vital information such as their absolute scale; and (3) the l… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the 5G wireless network used for data collection: (a) mobile devices used [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An overview of the anomalies curation process. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An illustrative difference between UCR Archive Anomaly dataset and the anomalies found [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An overview of the Q&A dataset. Finally, for an additional multi￾modal downstream task, we cu￾rate a set of Q&A pairs designed to probe the model’s understand￾ing of the time series data. Two families of Q&A are created: the first focuses on qualitative and quantitative aspects to assess a model’s ability to reason about inherent statistical and structural properties of the time series. Par￾ticularly, for … view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of a failure case that affected all [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Forecasting results of the highest-performing model (Informer) highlight key challenges: [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Randomly sampled variates from the ETTh1 dataset. Next, we observe the MotorImagery dataset that collects EEG data of imagined body movements using an 8 × 8 platinum electrode grid. Each of the 64 sensors corresponds to a variate, and data is recorded every millisecond. While the variates shown in [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Randomly sampled variates from the MotorImagery dataset. [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: In particular, DewPointFarenheit and DryBulbCelsius exhibit strong daily fluctuations, [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 10
Figure 10. Figure 10: Randomly sampled variates from the FRED-MD dataset. [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Randomly sampled variates from the WTH dataset. [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Randomly sampled sequence from TelecomTS. Time 6 8 10 12 14 (a) UL_MCS Time 0 1000 2000 (b) Estimated_UL_Buffer Time 0 20 40 (c) PRBs_DL_Current Time 0.05 0.10 0.15 0.20 0.25 (d) DL_BLER Time 6 8 10 12 (e) DL_MCS Time none UDP TCP (f) DL_Protocol [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Randomly sampled sequence from TelecomTS. link. Additionally, the gNB connects to the 5G core network instance over standard N2/N3 interfaces through a separate 10 Gbps Ethernet backhaul link, enabling full end-to-end standalone operation. A visual overview of the network deployment is provided in [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Spatial parti￾tioning of the environment into 3 zones. Network Zoning for Controlled Experiments. To systematically cap￾ture KPI variations under diverse radio conditions, the network was deployed in a controlled lab environment covering approximately 70 m2 . The space was partitioned into three spatial zones—Zone A, Zone B, and Zone C—based on the distance between the UE and the RU. This zon￾ing strategy… view at source ↗
Figure 15
Figure 15. Figure 15: Spectrograms illustrating benign and adversarial interference patterns during collection. [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Number of packets (top) and number of transmitted bytes (bottom) before (a) and after (b) [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Examples of anomaly effects under varying function types. [PITH_FULL_IMAGE:figures/full_fig_p023_17.png] view at source ↗
read the original abstract

Modern enterprises generate vast streams of time series metrics when monitoring complex systems, known as observability data. Unlike conventional time series from domains such as climate, observability data are zero-inflated, highly stochastic, and exhibit minimal temporal structure. Despite their importance, observability datasets remain underrepresented in public benchmarks due to proprietary restrictions and privacy concerns. Existing datasets are often anonymized and normalized, removing scale information and limiting their use for tasks such as anomaly detection, root cause analysis, and multi-modal reasoning. To address this gap, we introduce TelecomTS, a large-scale observability dataset derived from a 5G telecommunications network. TelecomTS features heterogeneous, de-anonymized covariates with explicit absolute scale information and provides a diverse suite of downstream tasks, including anomaly detection, root cause analysis, and multi-modal question-answering. Benchmarking state-of-the-art time series, language, reasoning, and multi-modal foundation models reveals that existing approaches struggle with the abrupt, noisy, and high-variance dynamics characteristic of observability data. Our experiments further underscore the importance of preserving covariates' absolute scale, emphasizing the need for foundation time series models that natively leverage scale information for practical real-world observability applications. The code is available at: https://github.com/Ali-maatouk/TelecomTS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces TelecomTS, a large-scale multi-modal observability dataset derived from a single 5G telecommunications network. It provides heterogeneous, de-anonymized covariates retaining absolute scale information and defines downstream tasks including anomaly detection, root cause analysis, and multi-modal question-answering. Benchmarking of state-of-the-art time series, language, reasoning, and multi-modal foundation models is reported to show that existing approaches struggle with the abrupt, noisy, and high-variance dynamics of observability data, with additional emphasis on the importance of preserving absolute scale.

Significance. If the empirical findings hold, the work is significant for releasing a public dataset that fills a gap in observability benchmarks, which are typically anonymized or normalized and thus limited for tasks requiring scale and noise modeling. The explicit availability of code at the cited GitHub repository supports reproducibility. The focus on scale information could usefully guide future foundation model development for real-world monitoring applications, though the single-network origin constrains broader generalization.

major comments (2)
  1. [Experiments section] Experiments section: The benchmarking claims that models struggle with observability dynamics are presented without sufficient detail on model variants, exact evaluation metrics, hyperparameter choices, or statistical significance tests; this prevents verification of the performance gaps and their attribution to abrupt/noisy characteristics rather than implementation choices.
  2. [Dataset and Tasks sections] Dataset and Tasks sections: The dataset is collected from one 5G deployment; without additional analysis or cross-validation showing that zero-inflation, covariate scales, and variance patterns are representative of general enterprise observability (rather than telecom-specific artifacts), the claim that SOTA models struggle with characteristic observability dynamics rests on an untested proxy assumption.
minor comments (2)
  1. [Abstract] Abstract: The summary of benchmarking results does not name the specific models or tasks evaluated, reducing standalone clarity.
  2. [Figures] Figures: Time-series example plots would benefit from explicit scale annotations and legends to illustrate the absolute-scale preservation emphasized in the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and describe the changes we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Experiments section] Experiments section: The benchmarking claims that models struggle with observability dynamics are presented without sufficient detail on model variants, exact evaluation metrics, hyperparameter choices, or statistical significance tests; this prevents verification of the performance gaps and their attribution to abrupt/noisy characteristics rather than implementation choices.

    Authors: We agree that the current level of detail is insufficient for independent verification. In the revised manuscript we will expand the Experiments section with (i) an exhaustive table of all model variants including architecture, parameter count, and fine-tuning procedure, (ii) precise definitions and formulas for every evaluation metric, (iii) the full hyperparameter search space and final selected values, and (iv) results of statistical significance tests (bootstrap confidence intervals and paired Wilcoxon tests) that quantify the performance gaps. These additions will allow readers to attribute differences more confidently to data characteristics. revision: yes

  2. Referee: [Dataset and Tasks sections] Dataset and Tasks sections: The dataset is collected from one 5G deployment; without additional analysis or cross-validation showing that zero-inflation, covariate scales, and variance patterns are representative of general enterprise observability (rather than telecom-specific artifacts), the claim that SOTA models struggle with characteristic observability dynamics rests on an untested proxy assumption.

    Authors: We acknowledge that TelecomTS originates from a single network and that explicit cross-network validation is not feasible with the data we have access to. In the revision we will add a dedicated Limitations subsection that (a) qualifies the generalization claim, (b) cites domain literature indicating that zero-inflation, absolute-scale heterogeneity, and abrupt variance are common across enterprise observability platforms, and (c) positions TelecomTS as an initial public benchmark rather than a definitive universal proxy. We will also soften the language in the abstract and introduction to reflect this scope. revision: partial

Circularity Check

0 steps flagged

Dataset release and empirical benchmarking exhibit no circularity

full rationale

The paper's core contribution is the release of TelecomTS, a new 5G-derived observability dataset, together with downstream tasks (anomaly detection, root cause analysis, multi-modal QA) and benchmarking of existing foundation models. No derivation chain, equations, or fitted parameters are claimed; the reported model struggles are direct empirical observations on the released data rather than reductions to prior fits or self-citations. The work is self-contained against external benchmarks because the dataset and tasks are newly introduced and the evaluation uses standard public models without load-bearing self-referential premises.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper contributes a new empirical dataset rather than relying on fitted parameters or new theoretical entities; the main premises are domain descriptions of observability data.

axioms (1)
  • domain assumption Observability data are zero-inflated, highly stochastic, and exhibit minimal temporal structure.
    Stated directly in the abstract as distinguishing characteristics of the data domain.

pith-pipeline@v0.9.0 · 5807 in / 1242 out tokens · 41177 ms · 2026-05-21T20:19:42.474983+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Towards Resilient and Autonomous Networks: A BlueSky Vision on AI-Native 6G

    cs.AI 2026-05 unverdicted novelty 4.0

    The paper envisions AI-native 6G networks anchored by a foundation model and multi-agent systems to shift network management to a unified multi-modal optimization problem.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    arXiv preprint arXiv:2410.10393(2024)

    Taha Aksu, Gerald Woo, Juncheng Liu, Xu Liu, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Gift-eval: A benchmark for general time series forecasting model evaluation. arxiv preprint arxiv:2410.10393, 2024

  2. [2]

    Maddix, Michael W

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Syndar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language of time se...

  3. [3]

    Flu Portal Dashboard

    Centers for Disease Control and Prevention. Flu Portal Dashboard. https://gis.cdc.gov/ grasp/fluview/fluportaldashboard.html, 2017. Online; accessed 21 May 2025

  4. [4]

    Mtbench: A multimodal time series benchmark for temporal reasoning and question answering, 2025

    Jialin Chen, Aosong Feng, Ziyu Zhao, Juan Garza, Gaukhar Nurbek, Cheng Qin, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, and Rex Ying. Mtbench: A multimodal time series benchmark for temporal reasoning and question answering, 2025. URL https://arxiv.org/abs/2503. 16858

  5. [5]

    This time is different: An observability perspective on time series foundation models, 2025

    Ben Cohen, Emaad Khwaja, Youssef Doubli, Salahidine Lemaachi, Chris Lettieri, Charles Masson, Hugo Miccinilli, Elise Ramé, Qiqi Ren, Afshin Rostamizadeh, Jean Ogier du Terrail, Anna-Monica Toon, Kan Wang, Stephan Xie, Zongzhe Xu, Viktoriya Zhukova, David Asker, Ameet Talwalkar, and Othmane Abou-Amal. This time is different: An observability perspective on...

  6. [6]

    A decoder-only foundation model for time-series forecasting

    Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting, 2024. URLhttps://arxiv.org/abs/2310.10688

  7. [7]

    Dataset card for boom (benchmark of observability metrics), 2024

    Datadog. Dataset card for boom (benchmark of observability metrics), 2024. URL https: //huggingface.co/datasets/Datadog/BOOM. Available on Hugging Face Datasets

  8. [8]

    The ucr time series classification archive, October 2018

    Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML. The ucr time series classification archive, October 2018

  9. [9]

    An interactive web-based dashboard to track covid-19 in real time.The Lancet Infectious Diseases, 20(5):533–534, 2020

    Ensheng Dong, Hongru Du, and Lauren Gardner. An interactive web-based dashboard to track covid-19 in real time.The Lancet Infectious Diseases, 20(5):533–534, 2020. ISSN 1473-3099. doi: https://doi.org/10.1016/S1473-3099(20)30120-1. URL https://www.sciencedirect. com/science/article/pii/S1473309920301201

  10. [10]

    Farahani, M.R

    Mojtaba A. Farahani, M.R. McCormick, Robert Gianinny, Frank Hudacheck, Ramy Harik, Zhichao Liu, and Thorsten Wuest. Time-series pattern recognition in smart manufacturing systems: A literature review and ontology.Journal of Manufacturing Systems, 69:208–241,

  11. [11]

    doi: https://doi.org/10.1016/j.jmsy.2023.05.025

    ISSN 0278-6125. doi: https://doi.org/10.1016/j.jmsy.2023.05.025. URL https://www. sciencedirect.com/science/article/pii/S0278612523000997

  12. [12]

    Fassois and John S

    Spilios D. Fassois and John S. Sakellariou.Statistical Time Series Methods for SHM. John Wiley & Sons, Ltd, 2009. ISBN 9780470061626. doi: https://doi.org/10.1002/ 9780470061626.shm044. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/ 9780470061626.shm044

  13. [13]

    In-context fine-tuning for time- series foundation models

    Matthew Faw, Rajat Sen, Yichen Zhou, and Abhimanyu Das. In-context fine-tuning for time- series foundation models. InForty-second International Conference on Machine Learning,

  14. [14]

    URLhttps://openreview.net/forum?id=uxzgGLWPj2

  15. [15]

    Mantis: Lightweight calibrated foundation model for user-friendly time series classification

    Vasilii Feofanov, Songkang Wen, Marius Alonso, Romain Ilbert, Hongbo Guo, Malik Tiomoko, Lujia Pan, Jianfeng Zhang, and Ievgen Redko. Mantis: Lightweight calibrated foundation model for user-friendly time series classification, 2025. URL https://arxiv.org/abs/ 2502.15637. 10

  16. [16]

    Forouzan and Sophia Chung Fegan.TCP/IP Protocol Suite

    Behrouz A. Forouzan and Sophia Chung Fegan.TCP/IP Protocol Suite. McGraw-Hill Higher Education, 2nd edition, 2002. ISBN 0072460601

  17. [17]

    Webb, Rob Hyndman, and Pablo Montero-Manso

    Rakshitha Wathsadini Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. URL https://openreview.net/forum?id=wEc1mgAjU-

  18. [18]

    Moment: a family of open time-series foundation models

    Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: a family of open time-series foundation models. InProceedings of the 41st Interna- tional Conference on Machine Learning, ICML’24. JMLR.org, 2024

  19. [19]

    Root cause analysis of anomalies in 5g ran using graph neural network and transformer, 2024

    Antor Hasan, Conrado Boeira, Khaleda Papry, Yue Ju, Zhongwen Zhu, and Israat Haque. Root cause analysis of anomalies in 5g ran using graph neural network and transformer, 2024. URL https://arxiv.org/abs/2406.15638

  20. [20]

    Root cause analysis of the most common network and user experience prob- lems, March 2021

    Muhammad Haseeb. Root cause analysis of the most common network and user experience prob- lems, March 2021. URL https://www.networkcomputing.com/network-security/ root-cause-analysis-of-the-most-common-network-and-user-experience-problems . Accessed: October 8, 2025

  21. [21]

    FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting

    Yifan Hu, Yuante Li, Peiyuan Liu, Yuxia Zhu, Naiqi Li, Tao Dai, Shu tao Xia, Dawei Cheng, and Changjun Jiang. Fintsb: A comprehensive and practical benchmark for financial time series forecasting, 2025. URLhttps://arxiv.org/abs/2502.18834

  22. [22]

    Libcity: A unified library towards efficient and comprehensive urban spatial-temporal prediction, 2024

    Jiawei Jiang, Chengkai Han, Wenjun Jiang, Wayne Xin Zhao, and Jingyuan Wang. Libcity: A unified library towards efficient and comprehensive urban spatial-temporal prediction, 2024. URLhttps://arxiv.org/abs/2304.14343

  23. [23]

    Deep learning for time series fore- casting: a survey.International Journal of Machine Learning and Cybernetics, 16(7–8): 5079–5112, February 2025

    Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muham- mad Marier, Yichen Liu, Yuhao Chen, and Feng Xia. Deep learning for time series fore- casting: a survey.International Journal of Machine Learning and Cybernetics, 16(7–8): 5079–5112, February 2025. ISSN 1868-808X. doi: 10.1007/s13042-025-02560-w. URL http://dx.doi.org/10.10...

  24. [24]

    Time-mqa: Time series multi-task question answering with context enhancement, 2025

    Yaxuan Kong, Yiyuan Yang, Yoontae Hwang, Wenjie Du, Stefan Zohren, Zhangyang Wang, Ming Jin, and Qingsong Wen. Time-mqa: Time series multi-task question answering with context enhancement, 2025. URLhttps://arxiv.org/abs/2503.01875

  25. [25]

    Foundation models for time series: A survey,

    Siva Rama Krishna Kottapalli, Karthik Hubli, Sandeep Chandrashekhara, Garima Jain, Sunayana Hubli, Gayathri Botla, and Ramesh Doddaiah. Foundation models for time series: A survey,

  26. [26]

    URLhttps://arxiv.org/abs/2504.04011

  27. [27]

    Nutime: Numerically multi-scaled embedding for large-scale time-series pretraining

    Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, and Zhirong Wu. Nutime: Numerically multi-scaled embedding for large-scale time-series pretraining. Transactions on Machine Learning Research (TMLR), 2024

  28. [28]

    Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, and B

    Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Kamarthi, Aditya B. Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, and B. Aditya Prakash. Time-MMD: Multi-domain multimodal dataset for time series analysis. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. URLhttps...

  29. [29]

    Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, and B

    Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Kamarthi, Aditya B. Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, and B. Aditya Prakash. Time-mmd: Multi-domain multimodal dataset for time series analysis, 2025. URL https: //arxiv.org/abs/2406.08627

  30. [30]

    Root cause analysis based on trace for mobile network problem

    Liang Liu, Xinzhou Cheng, Jiajia Zhu, Lexi Xu, Songbai Liang, Lijun Cheng, Jinyu Zhai, and Fred Dong. Root cause analysis based on trace for mobile network problem. In Yue Wang, Yuyang Liu, Jiaqi Zou, and Mengyao Huo (eds.),Signal and Information Processing, Networking and Computers, pp. 1185–1192, Singapore, 2023. Springer Nature Singapore. ISBN 978-981-...

  31. [31]

    Non-stationary transformers: Exploring the stationarity in time series forecasting, 2023

    Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Non-stationary transformers: Exploring the stationarity in time series forecasting, 2023. URL https://arxiv.org/abs/ 2205.14415

  32. [32]

    Timer: Transformers for time series analysis at scale,

    Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Timer: Generative pre-trained transformers are large time series models, 2024. URL https://arxiv.org/abs/2402.02368

  33. [33]

    A framework for the evaluation of network reliability under periodic demand.IEEE/ACM Transactions on Networking, 32(3):2495–2510, 2024

    Ali Maatouk, Fadhel Ayed, Shi Biao, Wenjie Li, Harvey Bao, and Enrico Zio. A framework for the evaluation of network reliability under periodic demand.IEEE/ACM Transactions on Networking, 32(3):2495–2510, 2024. doi: 10.1109/TNET.2024.3354516

  34. [34]

    Large language models for telecom: Forthcoming impact on the industry.IEEE Communications Magazine, 63(1):62–68, 2025

    Ali Maatouk, Nicola Piovesan, Fadhel Ayed, Antonio De Domenico, and Merouane Debbah. Large language models for telecom: Forthcoming impact on the industry.IEEE Communications Magazine, 63(1):62–68, 2025. doi: 10.1109/MCOM.001.2300473

  35. [35]

    M5 accuracy com- petition: Results, findings, and conclusions.International Journal of Forecasting, 38(4): 1346–1364, 2022

    Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. M5 accuracy com- petition: Results, findings, and conclusions.International Journal of Forecasting, 38(4): 1346–1364, 2022. ISSN 0169-2070. doi: https://doi.org/10.1016/j.ijforecast.2021.11.013. URL https://www.sciencedirect.com/science/article/pii/S0169207021001874. Special Issue: M5 c...

  36. [36]

    McCracken and Serena Ng and

    Michael W. McCracken and Serena Ng and. Fred-md: A monthly database for macroeconomic research.Journal of Business & Economic Statistics, 34(4):574–589, 2016. doi: 10.1080/ 07350015.2015.1086655. URLhttps://doi.org/10.1080/07350015.2015.1086655

  37. [37]

    Subseasonalclimateusa: A dataset for subseasonal forecasting and benchmarking,

    Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, and Lester Mackey. Subseasonalclimateusa: A dataset for subseasonal forecasting and benchmarking,

  38. [38]

    URLhttps://arxiv.org/abs/2109.10399

  39. [39]

    Zainib Noshad, Nadeem Javaid, Tanzila Saba, Zahid Wadud, Muhammad Qaiser Saleem, Mohammad Eid Alzahrani, and Osama E. Sheta. Fault detection in wireless sensor networks through the random forest classifier.Sensors, 19(7), 2019. ISSN 1424-8220. doi: 10.3390/ s19071568. URLhttps://www.mdpi.com/1424-8220/19/7/1568

  40. [40]

    Openairinterface 5g platform

    OpenAirInterface Software Alliance. Openairinterface 5g platform. https://www. openairinterface.org/, 2024

  41. [41]

    Nguyen, Pankaj Dayama, Renuka Sindhgatta, Prateeti Mohapatra, Harshit Kumar, Jayant Kalagnanam, Nandyala Hemachandra, and Narayan Rangaraj

    Santosh Palaskar, Vijay Ekambaram, Arindam Jati, Neelamadhav Gantayat, Avirup Saha, Seema Nagar, Nam H. Nguyen, Pankaj Dayama, Renuka Sindhgatta, Prateeti Mohapatra, Harshit Kumar, Jayant Kalagnanam, Nandyala Hemachandra, and Narayan Rangaraj. Automixer for improved multivariate time-series forecasting on business and it observability data. In Proceedings...

  42. [42]

    doi: 10.1609/aaai.v38i21.30336

    ISBN 978-1-57735-887-9. doi: 10.1609/aaai.v38i21.30336. URL https://doi.org/ 10.1609/aaai.v38i21.30336

  43. [43]

    Toward addressing training data scarcity challenge in emerging radio access networks: A survey and framework.IEEE Communications Surveys & Tutorials, 25(3):1954–1990, 2023

    Haneya Naeem Qureshi, Usama Masood, Marvin Manalastas, Syed Muhammad Asad Zaidi, Hasan Farooq, Julien Forgeat, Maxime Bouton, Shruti Bothe, Per Karlsson, Ali Rizwan, and Ali Imran. Toward addressing training data scarcity challenge in emerging radio access networks: A survey and framework.IEEE Communications Surveys & Tutorials, 25(3):1954–1990, 2023. doi...

  44. [44]

    Are language models actually useful for time series forecasting? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

    Mingtian Tan, Mike A Merrill, Vinayak Gupta, Tim Althoff, and Thomas Hartvigsen. Are language models actually useful for time series forecasting? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview. net/forum?id=DV15UbHCY1

  45. [45]

    Lee, Artjom Joosen, Rajkarn Singh, and Martin Asenov

    William Toner, Thomas L. Lee, Artjom Joosen, Rajkarn Singh, and Martin Asenov. Performance of zero-shot time series foundation models on cloud data, 2025. URL https://arxiv.org/ abs/2502.12944. 12

  46. [46]

    Unified training of universal time series forecasting transformers,

    Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers, 2024. URL https: //arxiv.org/abs/2402.02592

  47. [47]

    Unified training of universal time series forecasting transformers

    Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. InForty-first International Conference on Machine Learning, 2024

  48. [48]

    Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022

    Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, 2022. URL https:// arxiv.org/abs/2106.13008

  49. [49]

    TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

    Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis, 2023. URL https://arxiv. org/abs/2210.02186

  50. [50]

    Renjie Wu and Eamonn J. Keogh. Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress .IEEE Transactions on Knowledge & Data Engineering, 35(03):2421–2429, March 2023. ISSN 1558-2191. doi: 10.1109/ TKDE.2021.3112126. URL https://doi.ieeecomputersociety.org/10.1109/TKDE. 2021.3112126

  51. [51]

    On the data quality and imbalance in machine learning-based design and manufacturing—a systematic review.Engineering, 45:105–131,

    Jiarui Xie, Lijun Sun, and Yaoyao Fiona Zhao. On the data quality and imbalance in machine learning-based design and manufacturing—a systematic review.Engineering, 45:105–131,

  52. [52]

    doi: https://doi.org/10.1016/j.eng.2024.04.024

    ISSN 2095-8099. doi: https://doi.org/10.1016/j.eng.2024.04.024. URL https://www. sciencedirect.com/science/article/pii/S2095809924003734

  53. [53]

    Graph neural network based root cause analysis using multivariate time-series kpis for wireless networks

    Chia-Cheng Yen, Wenting Sun, Hakimeh Purmehdi, Won Park, Kunal Rajan Deshmukh, Nishank Thakrar, Omar Nassef, and Adam Jacobs. Graph neural network based root cause analysis using multivariate time-series kpis for wireless networks. InNOMS 2022- 2022 IEEE/IFIP Network Operations and Management Symposium, pp. 1–7, 2022. doi: 10.1109/NOMS54207.2022.9789858

  54. [54]

    Informer: Beyond efficient transformer for long sequence time-series forecasting, 2021

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting, 2021. URLhttps://arxiv.org/abs/2012.07436

  55. [55]

    Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting.arXiv preprint arXiv:2201.12740, 2022

    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting, 2022. URL https://arxiv.org/abs/2201.12740. 13 A Analysis ofTelecomTSand Comparison With Existing Datasets While there exists a plethora of time series datasets, most multivariate datasets lack one o...

  56. [56]

    Begin by scanning the time series for any unusual behavior: sharp spikes or drops, sustained deviations, or values inconsistent with the expected range

  57. [57]

    Consider inter-metric relationships — for example, whether high buffer utilization coincides with low throughput or high BLER

  58. [58]

    All anomalies occur at the same timestamp range, so you should identify a single set of timestamps for the anomaly event and attribute affected metrics to that period. Summarize your conclusion as follows: <conclusion > Anomaly Detected: [Yes/No] [If yes, include the following strictly formatted line:] Anomaly Timestamps: [(start_time1, end_time1), (start...

  59. [59]

    start_time

    Do NOT include any other analysis or explanations. Network QA Prompts. These prompts are used to assess a model’s network understanding capabilities. For each prompt, we provide the KPIs from a sample and ask the model to provide an answer to the question at hand. 28 Network QA Prompt You are an AI assistant tasked with analyzing time series data for a wi...