TeleCom-Bench: How Far Are Large Language Models from Industrial Telecommunication Applications?
Pith reviewed 2026-05-20 11:19 UTC · model grok-4.3
The pith
Large language models understand telecom language tasks at 90 percent but generate solutions at only 30 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Evaluations on TeleCom-Bench reveal a universal Execution Wall: models reach roughly 90 percent accuracy on linguistic interface tasks such as intent recognition and entity extraction yet fall to approximately 30 percent on procedural execution tasks such as solution generation, showing that current LLMs function competently as diagnosticians but fail as field engineers.
What carries the argument
TeleCom-Bench, a benchmark of 12 evaluation sets and 22,678 samples that separates multi-dimensional knowledge comprehension (via knowledge-graph synthesis of protocols and product data) from end-to-end knowledge application on six authentic network-agent tasks.
If this is right
- Telecom operators can adopt the benchmark to track whether fine-tuned models cross the threshold for safe deployment in fault-maintenance workflows.
- Developers should allocate alignment effort to procedural reasoning rather than further gains on language-understanding subtasks.
- The benchmark supplies concrete diagnostics that can steer domain-specific continued pre-training or tool-augmented agent designs.
- Similar execution gaps are likely to appear in any vertical that pairs documentation with sequential equipment actions.
Where Pith is reading between the lines
- The same comprehension-versus-execution split may limit LLM agents in other equipment-heavy sectors such as power-grid operations or semiconductor manufacturing.
- Extending the benchmark with closed-loop simulators would let researchers measure whether generated solutions actually resolve the reported faults.
- Hybrid architectures that pair LLMs with rule-based verification modules could close the gap faster than scaling alone.
Load-bearing premise
The six core tasks drawn from live network agent workflows are assumed to capture the essential skills required for production-ready telecom agents.
What would settle it
A controlled test in which any of the evaluated models achieves sustained accuracy above 60 percent on solution-generation items while using only the supplied equipment documentation and without external tools would falsify the claimed universal Execution Wall.
Figures
read the original abstract
While Large Language Models have achieved remarkable integration in various vertical scenarios, their deployment in the telecommunications domain remains exploratory due to the lack of a standardized evaluation framework. Current telecom benchmarks primarily focus on static, foundational knowledge and isolated atomic skills, neglecting the equipment-specific documentation and end-to-end industrial workflows essential for real-world production systems. To bridge this gap, we present TeleCom-Bench, a comprehensive benchmark comprising 12 evaluation sets with 22,678 curated samples, which evaluates LLMs across a synergistic hierarchy: (1) Multi-dimensional Knowledge Comprehension, which integrates telecommunication fundamentals, 3GPP protocols, and 5G network architecture with proprietary product knowledge across wired, core, and wireless networks via knowledge graph-driven synthesis; and (2)End-to-End Knowledge Application, which formalizes six core tasks on authentic trajectories from live network agent workflows, including intent recognition, entity extraction, event verification, tool invocation, root cause analysis, and solution generation-across network optimization and fault maintenance scenarios. Evaluations of eight state-of-the-art LLMs reveal a universal Execution Wall: while models achieve 90% accuracy in linguistic interface tasks such as intent recognition and entity extraction, performance collapses to approximately 30% in procedural execution tasks like solution generation. This capability gap demonstrates that current LLMs function competently as diagnosticians but fail as field engineers. TeleCom-Bench provides standardized diagnostics to precisely pinpoint this deficit, offering actionable guidance for domain-specific alignment toward production-ready telecom agents. The dataset and evaluation code have been released at https://github.com/ZTE-AICloud/TeleCom-Bench.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TeleCom-Bench, a benchmark with 12 evaluation sets and 22,678 curated samples that assesses LLMs on a hierarchy of multi-dimensional knowledge comprehension (telecom fundamentals, 3GPP protocols, 5G architecture, and proprietary product knowledge) and end-to-end knowledge application. The latter formalizes six core tasks—intent recognition, entity extraction, event verification, tool invocation, root cause analysis, and solution generation—drawn from authentic live network agent workflows in optimization and fault maintenance. Evaluations of eight state-of-the-art LLMs show ~90% accuracy on linguistic interface tasks but collapse to ~30% on procedural execution tasks such as solution generation, leading to the claim of a universal 'Execution Wall' where LLMs function as competent diagnosticians but fail as field engineers. The dataset and code are released publicly.
Significance. If the benchmark tasks are representative of industrial requirements, the reported performance gap supplies concrete, standardized diagnostics for LLM limitations in telecom and offers actionable guidance for domain-specific alignment toward production-ready agents. The public release of the dataset and evaluation code is a clear strength that supports reproducibility and community follow-up work.
major comments (2)
- [Abstract / End-to-End Knowledge Application description] End-to-End Knowledge Application (six core tasks): The central 'Execution Wall' interpretation—that the 90%-to-30% gap demonstrates LLMs 'fail as field engineers'—rests on the assumption that the six tasks drawn from live trajectories capture the essential capabilities for production-ready telecom agents. The manuscript provides no reported expert validation, coverage analysis against full job requirements (e.g., novel equipment states, real-time uncertainty, or physical-layer coordination), or comparison to broader industrial workflows; this is load-bearing for generalizing the observed collapse beyond the specific benchmark.
- [Abstract] Abstract (sample curation): The concrete accuracy numbers and the 22,678-sample count are presented without details on curation criteria, inter-annotator agreement, or exclusion rules for the trajectories. This weakens independent verification of the exact 90-to-30 gap and its attribution to model capability rather than benchmark construction choices.
minor comments (2)
- Consider adding a summary table that reports per-LLM, per-task accuracies (including confidence intervals or variance across runs) to make the 'universal' gap claim easier to inspect at a glance.
- Clarify the exact definition and scoring rubric for 'solution generation' and 'root cause analysis' tasks, as these appear to be the primary drivers of the reported performance drop.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating where revisions will be made to improve clarity and transparency.
read point-by-point responses
-
Referee: [Abstract / End-to-End Knowledge Application description] End-to-End Knowledge Application (six core tasks): The central 'Execution Wall' interpretation—that the 90%-to-30% gap demonstrates LLMs 'fail as field engineers'—rests on the assumption that the six tasks drawn from live trajectories capture the essential capabilities for production-ready telecom agents. The manuscript provides no reported expert validation, coverage analysis against full job requirements (e.g., novel equipment states, real-time uncertainty, or physical-layer coordination), or comparison to broader industrial workflows; this is load-bearing for generalizing the observed collapse beyond the specific benchmark.
Authors: We agree that the manuscript does not report a formal expert validation study or quantitative coverage analysis against the full spectrum of job requirements such as novel equipment states or physical-layer coordination. The six tasks were formalized directly from authentic live network agent workflows in optimization and fault maintenance, as described in the paper. To strengthen the grounding of the 'Execution Wall' claim, we will revise the manuscript to include additional details on the workflow analysis process and how these tasks map to core industrial procedures. This will better contextualize the scope of the observed performance gap without overgeneralizing beyond the benchmark. revision: yes
-
Referee: [Abstract] Abstract (sample curation): The concrete accuracy numbers and the 22,678-sample count are presented without details on curation criteria, inter-annotator agreement, or exclusion rules for the trajectories. This weakens independent verification of the exact 90-to-30 gap and its attribution to model capability rather than benchmark construction choices.
Authors: We concur that explicit details on curation criteria, inter-annotator agreement, and exclusion rules are necessary for independent verification. The current manuscript provides only a high-level description of the 22,678 curated samples. In the revised version, we will expand the relevant sections to document the curation process, including agreement metrics and trajectory filtering rules, thereby allowing readers to more rigorously assess the benchmark construction and the reported performance differences. revision: yes
Circularity Check
No circularity: empirical benchmark with no derivations or fitted predictions
full rationale
The paper constructs TeleCom-Bench as a new test set from live network trajectories and measures LLM accuracy on six tasks; the central claim of an Execution Wall is a direct empirical observation on external models rather than any derivation, equation, or parameter fit that reduces to the authors' inputs. No self-citations, ansatzes, or uniqueness theorems are invoked to justify the results. The benchmark is self-contained against external models and the newly released dataset.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 22,678 curated samples and six core tasks drawn from live network agent workflows accurately represent essential industrial skills.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Evaluations of eight state-of-the-art LLMs reveal a universal Execution Wall: while models achieve 90% accuracy in linguistic interface tasks such as intent recognition and entity extraction, performance collapses to approximately 30% in procedural execution tasks like solution generation.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
End-to-End Knowledge Application, which formalizes six core tasks on authentic trajectories from live network agent workflows
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Tasnim Ahmed, Nicola Piovesan, Antonio De Domenico, and Salimur Choudhury
-
[2]
In2024 IEEE International Conference on Communications Workshops (ICC Work- shops)
Linguistic intelligence in large language models for telecommunications. In2024 IEEE International Conference on Communications Workshops (ICC Work- shops). IEEE, 1237–1243
- [3]
-
[4]
Gordon Owusu Boateng, Hani Sami, Ahmed Alagha, Hanae Elmekki, Ahmad Hammoud, Rabeb Mizouni, Azzam Mourad, Hadi Otrok, Jamal Bentahar, Sami Muhaidat, et al. 2025. A survey on large language models for communication, network, and service management: Application insights, challenges, and future directions.IEEE Communications Surveys & Tutorials(2025)
work page 2025
-
[5]
Lun-Chi Chen, Mayuresh Sunil Pardeshi, Yi-Xiang Liao, and Kai-Chih Pai. 2025. Application of retrieval-augmented generation for interactive industrial knowl- edge management via a large language model.Computer Standards & Interfaces 94 (2025), 103995
work page 2025
- [6]
-
[7]
Kewei Cheng, Nesreen K Ahmed, Ryan A Rossi, Theodore Willke, and Yizhou Sun. 2025. Neural-symbolic methods for knowledge graph reasoning: A survey. ACM Transactions on Knowledge Discovery from Data18, 9 (2025), 1–44
work page 2025
- [8]
-
[9]
2025.An Introduction to 5G: The New Radio, 5G Network, 5G Advanced and Beyond
Christopher Cox. 2025.An Introduction to 5G: The New Radio, 5G Network, 5G Advanced and Beyond. John Wiley & Sons
work page 2025
-
[10]
Xinbang Dai, Yuncheng Hua, Tongtong Wu, Yang Sheng, Qiu Ji, and Guilin Qi
-
[11]
Large language models can better understand knowledge graphs than we thought.Knowledge-Based Systems312 (2025), 113060
work page 2025
- [12]
-
[13]
Jinru Ding, Lu Lu, Chao Ding, Mouxiao Bian, Jiayuan Chen, Wenrao Pang, Ruiyao Chen, Xinwei Peng, Renjie Lu, Sijie Ren, et al. 2025. MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multi- modal Models, and Intelligent Agents.arXiv preprint arXiv:2511.14439(2025)
-
[14]
Yixin Dong, Charlie F Ruan, Yaxing Cai, Ziyi Xu, Yilong Zhao, Ruihang Lai, and Tianqi Chen. 2025. Xgrammar: Flexible and efficient structured generation engine for large language models.Proceedings of Machine Learning and Systems7 (2025)
work page 2025
- [15]
- [16]
-
[17]
Pranshav Gajjar and Vijay K Shah. 2025. Oran-bench-13k: An open source benchmark for assessing llms in open radio access networks. In2025 IEEE 22nd Consumer Communications & Networking Conference (CCNC). IEEE, 1–4
work page 2025
-
[18]
Andrew Gao. 2023. Prompt engineering for large language models.A vailable at SSRN 4504303(2023)
work page 2023
-
[19]
Zengyi Gao, Yukun Cao, Hairu Wang, Ao Ke, Yuan Feng, S Kevin Zhou, and Xike Xie. 2025. Frag: A flexible modular framework for retrieval-augmented generation based on knowledge graphs. InFindings of the Association for Computational Linguistics: ACL 2025. 6178–6192
work page 2025
-
[20]
S Garcia Murillo and A Gouaillard. 2025. RFC 9725: WebRTC-HTTP Ingestion Protocol (WHIP)
work page 2025
- [21]
-
[22]
Siming Huang, Tianhao Cheng, Jason Klein Liu, Weidi Xu, Jiaran Hao, Liuyihan Song, Yang Xu, Jian Yang, Jiaheng Liu, Chenchen Zhang, et al. 2025. Opencoder: The open cookbook for top-tier code large language models. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 33167–33193
work page 2025
- [23]
-
[24]
Imtiaz Karim, Kazi Samin Mubasshir, Mirza Masfiqur Rahman, and Elisa Bertino
- [25]
-
[26]
Fahime Khoramnejad and Ekram Hossain. 2025. Generative AI for the optimiza- tion of next-generation wireless networks: Basics, state-of-the-art, and open challenges.IEEE Communications Surveys & Tutorials(2025)
work page 2025
-
[27]
Simon Knollmeyer, Oğuz Caymazer, and Daniel Grossmann. 2025. Document GraphRAG: Knowledge Graph Enhanced Retrieval Augmented Generation for Document Question Answering Within the Manufacturing Domain.Electronics 14, 11 (2025), 2102
work page 2025
- [28]
-
[29]
Xingqin Lin. 2025. 3GPP Evolution from 5G to 6G: A 10-Year Retrospective. In Telecom, Vol. 6. MDPI, 32
work page 2025
-
[30]
Xingqin Lin. 2025. The bridge toward 6G: 5G-Advanced evolution in 3GPP Release I9.IEEE Communications Standards Magazine9, 1 (2025), 28–35
work page 2025
-
[31]
Yang Liu, Jiahuan Cao, Chongyu Liu, Kai Ding, and Lianwen Jin. 2025. Datasets for large language models: A comprehensive survey.Artificial Intelligence Review 58, 12 (2025), 403
work page 2025
-
[32]
Yuhe Liu, Changhua Pei, Longlong Xu, Bohan Chen, Mingze Sun, Zhirui Zhang, Yongqian Sun, Shenglin Zhang, Kun Wang, Haiming Zhang, et al. 2025. Opseval: A comprehensive benchmark suite for evaluating large language models’ capability in it operations domain. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. 503–513
work page 2025
-
[33]
Sifan Long, Jingjing Tan, Bomin Mao, Fengxiao Tang, Yangfan Li, Ming Zhao, and Nei Kato. 2025. A survey on intelligent network operations and performance optimization based on large language models.IEEE Communications Surveys & Tutorials(2025)
work page 2025
-
[34]
Sifan Long, Jingjing Tan, Bomin Mao, Fengxiao Tang, Yangfan Li, Ming Zhao, and Nei Kato. 2025. A Survey on Intelligent Network Operations and Performance Optimization Based on Large Language Models.IEEE Communications Surveys & Tutorials27, 6 (2025), 3915–3949. doi:10.1109/COMST.2025.3526606
-
[35]
Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, and Cong Yao. 2024. Layoutllm: Layout instruction tuning with large language models for document understanding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15630–15640
work page 2024
-
[36]
Ali Maatouk, Fadhel Ayed, Nicola Piovesan, Antonio De Domenico, Merouane Debbah, and Zhi-Quan Luo. 2025. Teleqna: A benchmark dataset to assess large language models telecommunications knowledge.IEEE Network(2025)
work page 2025
-
[37]
Abdul Majeed and Sungchang Lee. 2020. Anonymization techniques for privacy preserving data publishing: A comprehensive survey.IEEE access9 (2020), 8512– 8545
work page 2020
-
[38]
Jeevan Kumar Manda. 2023. Privacy-Preserving Technologies in Telecom Data Analytics: Implementing Privacy-Preserving Techniques Like Differential Privacy to Protect Sensitive Customer Data During Telecom Data Analytics.A vailable at SSRN 5136773(2023)
work page 2023
- [39]
-
[40]
Mahesh Mokale. 2024. Data Anonymization Techniques for Enhanced User Privacy in Telecommunications. (2024)
work page 2024
-
[41]
Said Gurbuz, Michele Dolfi, Miquel Farré, and Peter W
Ahmed Nassar, Andres Marafioti, Matteo Omenetti, Maksym Lysak, Nikolaos Livathinos, Christoph Auer, Lucas Morin, Rafael Teixeira de Lima, Yusik Kim, A. Said Gurbuz, Michele Dolfi, Miquel Farré, and Peter W. J. Staar. 2025. Smol- Docling: An ultra-compact vision-language model for end-to-end multi-modal document conversion. arXiv:2503.11576 [cs.CV] https:/...
-
[42]
Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. 2025. A com- prehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology16, 5 (2025), 1–72
work page 2025
- [43]
-
[44]
Grzegorz Panek, Piotr Matysiak, Marcin Ziółkowski, Ilhem Fajjari, Cyril Auboin, and Iwona Wojdan. 2025. Taia: Telco generative ai-powered multi-agent assistant for managing cloud-native networks. In2025 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE, 238–243
work page 2025
-
[45]
Diego Frazatto Pedroso, Luís Almeida, Lucas Eduardo Gulka Pulcinelli, William Akihiro Alves Aisawa, Inês Dutra, and Sarita Mazzini Bruschi. 2025. Anomaly Detection and Root Cause Analysis in Cloud-Native Environments Using Large Language Models and Bayesian Networks.IEEE Access13 (2025), 77550–77564. doi:10.1109/ACCESS.2025.3565220
-
[46]
Changhua Pei, Zexin Wang, Fengrui Liu, Zeyan Li, Yang Liu, Xiao He, Rong Kang, Tieying Zhang, Jianjun Chen, Jianhui Li, Gaogang Xie, and Dan Pei. 2025. Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis. Association for Computing Machinery, New York, NY, USA. doi:10. 1145/3701716.3715225
-
[47]
Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. 2025. Graph retrieval-augmented generation: A survey. KDD’2026, August 9-13, 2026, Jeju, Korea Jieting Xiao et al. ACM Transactions on Information Systems44, 2 (2025), 1–52
work page 2025
-
[48]
Petar Radanliev. 2025. Artificial intelligence: reflecting on the past and looking towards the next paradigm shift.Journal of Experimental & Theoretical Artificial Intelligence37, 7 (2025), 1045–1062
work page 2025
-
[49]
PR Sudha Rani and Aaluri Seenu. 2025. Automated Multiple-Choice Question Generation Using Gemini Gen AI.American Advanced Journal for Emerging Disciplinaries (AAJED) ISSN: 3067-41903, 1 (2025)
work page 2025
- [50]
-
[51]
Adnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi, Ahmed Elbakary, Alexan- dros Nikou, Ali Maatouk, Ali Mokh, Amirreza Kazemi, Antonio De Domenisco, Athanasios Karapantelakis, et al. 2025. Large-scale AI in telecom: Charting the roadmap for innovation, scalability, and enhanced digital experiences.arXiv preprint arXiv:2503.04184(2025)
-
[52]
Haochen Shi, Xinyao Liu, Fengmao Lv, Hongtao Xue, Jie Hu, Shengdong Du, and Tianrui Li. 2025. A pre-trained data deduplication model based on active learning.Expert Systems with Applications(2025), 128628
work page 2025
-
[53]
Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Zifeng Wang, Sayna Ebrahimi, and Hao Wang. 2025. Continual learning of large language models: A comprehensive survey.Comput. Surveys58, 5 (2025), 1–42
work page 2025
- [54]
-
[55]
Xingyu Tan, Xiaoyang Wang, Qing Liu, Xiwei Xu, Xin Yuan, and Wenjie Zhang
-
[56]
InProceedings of the ACM on Web Conference 2025
Paths-over-graph: Knowledge graph empowered large language model reasoning. InProceedings of the ACM on Web Conference 2025. 3505–3522
work page 2025
-
[57]
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. Self-instruct: Aligning language mod- els with self-generated instructions. InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers). 13484–13508
work page 2023
-
[58]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837
work page 2022
-
[59]
Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ning- hao Liu, and Dong Yu. 2024. From language modeling to instruction following: Understanding the behavior shift in llms after instruction tuning. InProceedings of the 2024 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Techn...
work page 2024
-
[60]
Derong Xu, Xinhang Li, Ziheng Zhang, Zhenxi Lin, Zhihong Zhu, Zhi Zheng, Xian Wu, Xiangyu Zhao, Tong Xu, and Enhong Chen. 2025. Harnessing large language models for knowledge graph question answering via adaptive multi- aspect retrieval-augmentation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 25570–25578
work page 2025
-
[61]
Bo Yang, Ruihuai Liang, Weixin Li, Han Wang, Xuelin Cao, Zhiwen Yu, Samson Lasaulce, Mérouane Debbah, Mohamed-Slim Alouini, H Vincent Poor, et al. 2026. Frontiers of generative AI for network optimization: Theories, limits, and visions. IEEE Communications Surveys & Tutorials(2026)
work page 2026
-
[62]
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models, 2023.URL https://arxiv. org/abs/2305.106013 (2023), 1
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [63]
- [64]
- [65]
-
[66]
Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Guoyin Wang, et al. 2026. Instruction tuning for large language models: A survey.Comput. Surveys58, 7 (2026), 1–36
work page 2026
-
[67]
Mingwei Zheng, Danning Xie, Qingkai Shi, Chengpeng Wang, and Xiangyu Zhang. 2025. Validating network protocol parsers with traceable rfc document interpretation.Proceedings of the ACM on Software Engineering2, ISSTA (2025), 1772–1794
work page 2025
- [68]
-
[69]
Hao Zhou, Chengming Hu, Dun Yuan, Ye Yuan, Di Wu, Xi Chen, Hina Tabassum, and Xue Liu. 2025. Large language models for wireless networks: an overview from the prompt engineering perspective.IEEE Wireless Communications(2025)
work page 2025
-
[70]
Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, et al. 2024. Large language model (llm) for telecom- munications: A comprehensive survey on principles, key techniques, and oppor- tunities.IEEE Communications Surveys & Tutorials27, 3 (2024), 1955–2005
work page 2024
-
[71]
Shengqi Zhu and Jeffrey Rzeszotarski. 2025. What we talk about when we talk about LMs: implicit paradigm shifts and the ship of language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 4628–4646
work page 2025
-
[72]
NR Base Station Super Cell CP Out-of-Service
Hang Zou, Qiyang Zhao, Yu Tian, Lina Bariah, Faouzi Bader, Thierry Lestable, and Merouane Debbah. 2025. Telecomgpt: A framework to build telecom-specific large language models.IEEE Transactions on Machine Learning in Communications and Networking(2025). A Appendix A.1 A.1 Question Format Specifications in TeleCom-Bench TeleCom-Bench is a benchmark dataset...
work page 2025
-
[73]
[IQ Fragment Cleanup&defragmentIQ]
-
[74]
[Alarm Recovery Check] glm4.7 Step 1: [Alarm Recovery Check] &observation and determination duration=3 minutes& Step 2: [IQ Fragment Cleanup] &method=defragmentIQ& grok 4.1
-
[75]
[IQ Fragment Cleanup &defragmentIQNR&]
-
[76]
[Alarm Recovery Check] Standard Answer step1.[IQ Fragment Cleanup] step2.[Alarm Recovery Check] step3.[Notify Human For Handling] deepseek v3.2 [Alarm Recovery Check] [IQ Fragment Cleanup&defragmentIQ] [Alarm Recovery Check&3 minutes] Figure 7: Model responses to a complex fault resolution task. Generalist models produce unstructured advice or hallucinate...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.