pith. sign in

arxiv: 2511.13131 · v2 · submitted 2025-11-17 · 💻 cs.AI · cs.CV· cs.ET· cs.NI

MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications

Pith reviewed 2026-05-17 22:03 UTC · model grok-4.3

classification 💻 cs.AI cs.CVcs.ETcs.NI
keywords MM-Telcomultimodal LLMstelecom applicationsbenchmarksfine-tuningnetwork operationsdomain adaptation
0
0 comments X

The pith

Fine-tuning multimodal LLMs on telecom-specific benchmarks leads to significant performance improvements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MM-Telco, a suite of multimodal benchmarks and associated models for the telecommunications domain. These benchmarks cover practical tasks in network operations, management, documentation improvement, and retrieval of text and images. Experiments with various large language and vision-language models show that fine-tuning on the new dataset produces a marked increase in task performance. By highlighting weaknesses in existing models, the work points the way for more effective domain adaptation in telecom applications.

Core claim

MM-Telco provides benchmarks for text and image tasks in telecom, and models fine-tuned on this data achieve substantial gains over baselines, enabling better automation of complex reasoning and decision-making in network optimization, troubleshooting, customer support, and regulatory compliance.

What carries the argument

The MM-Telco benchmark suite consisting of various practical real-life telecom tasks that are both text-based and image-based.

If this is right

  • Improved automation of network optimization and troubleshooting processes.
  • Enhanced quality of documentation and efficient retrieval of relevant information.
  • Identification of limitations in current state-of-the-art multimodal models for guiding future research.
  • Broader adoption of LLMs for ensuring regulatory compliance in telecom operations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Successful deployment could reduce operational costs and errors in managing large-scale telecom networks.
  • Similar benchmark approaches might accelerate adaptation of AI models in other specialized industries.
  • The image-based tasks suggest potential for integrating visual network monitoring with language models.

Load-bearing premise

The new tasks and benchmarks reflect actual real-life telecom scenarios and that gains seen in controlled tests will carry over to real-world network improvements.

What would settle it

Running the fine-tuned models on live telecom network data and measuring if they outperform standard tools in resolving actual operational issues within a set time frame.

Figures

Figures reproduced from arXiv: 2511.13131 by Abdelaali Chaoub, Anshul Kumar, Apu Chakraborty, Ashutosh Modi, Gagan Raj Gupta, Manish Rai, Moyank Giri, M. V. Kiran Sooraj, Soumajit Pramanik, Sunny Kumar, Yashwanth Holla.

Figure 1
Figure 1. Figure 1: Key components of our evaluation framework for enabling Multimodal Telecom applications. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Agentic Pipeline for MCQs generation [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of MCQ questions across 3GPP [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Multi-hop MCQ For scenario-based filter generation from PCAP analysis ( [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Image Based MCQ RQ3: Private Fine-Tuned LLMs for Telecom: To address RQ3, we evaluated the impact of fine-tuning telecom-specific LLMs on performance and data privacy. As shown in [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Examples of Multiple Choice Questions, Long Answer Questions, and Named Entity Classification [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Examples of Image-based Multiple Choice Questions and Long Answer Questions from the image category. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example of Scenario-Based Filter Generation Benchmark. [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Illustration of how the model generates a sequence diagram from a textual prompt using Mermaid code. [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Model’s capability to correct inaccurate sequence diagrams using Mermaid code. [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Model generating accurate packet diagrams from bit-level header specifications using Mermaid code. [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Correction of an incomplete TCP packet header diagram using Mermaid code. [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Example of a Multi-Modal chatbot designed for 3GPP documents. [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗
read the original abstract

Large Language Models (LLMs) have emerged as powerful tools for automating complex reasoning and decision-making tasks. In telecommunications, they hold the potential to transform network optimization, automate troubleshooting, enhance customer support, and ensure regulatory compliance. However, their deployment in telecom is hindered by domain-specific challenges that demand specialized adaptation. To overcome these challenges and to accelerate the adaptation of LLMs for telecom, we propose MM-Telco, a comprehensive suite of multimodal benchmarks and models tailored for the telecom domain. The benchmark introduces various tasks (both text based and image based) that address various practical real-life use cases such as network operations, network management, improving documentation quality, and retrieval of relevant text and images. Further, we perform baseline experiments with various LLMs and VLMs. The models fine-tuned on our dataset exhibit a significant boost in performance. Our experiments also help analyze the weak areas in the working of current state-of-art multimodal LLMs, thus guiding towards further development and research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces MM-Telco, a suite of multimodal benchmarks and fine-tuned models for telecom applications. It defines text- and image-based tasks covering network operations, management, documentation quality, and retrieval, reports baseline results with various LLMs and VLMs, and asserts that fine-tuning on the new dataset produces a significant performance boost while highlighting weaknesses in current multimodal models.

Significance. If the benchmarks are shown to be representative of real operator workflows and the performance gains are supported by detailed quantitative evaluations with proper baselines and statistical controls, the work could supply useful resources for domain adaptation of multimodal LLMs in telecommunications, aiding practical applications in network optimization and support.

major comments (2)
  1. Abstract: the assertion that 'the models fine-tuned on our dataset exhibit a significant boost in performance' is presented without any numerical metrics, tables, baseline comparisons, error bars, or evaluation protocol details, leaving the central empirical claim without visible quantitative support.
  2. Task and benchmark construction sections: the tasks are stated to address 'practical real-life use cases' yet the manuscript supplies no description of derivation from operator logs, expert annotation procedures, comparison to production telemetry, or external validation, which is load-bearing for claims that measured gains will translate to deployable improvements.
minor comments (1)
  1. Abstract: consider adding one sentence on dataset scale or the specific base models used in the baselines to give readers immediate context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the presentation of quantitative results and the justification for benchmark relevance. We address each point below and have revised the manuscript to improve clarity and support for our claims.

read point-by-point responses
  1. Referee: Abstract: the assertion that 'the models fine-tuned on our dataset exhibit a significant boost in performance' is presented without any numerical metrics, tables, baseline comparisons, error bars, or evaluation protocol details, leaving the central empirical claim without visible quantitative support.

    Authors: We agree that the abstract should provide immediate quantitative support for the performance claims. In the revised version, we have updated the abstract to include key numerical results from our experiments, such as specific accuracy and F1-score improvements for the fine-tuned models relative to the zero-shot and few-shot baselines. We also added a brief reference to the evaluation protocol and main results tables in the experimental section. The detailed metrics, baselines, and statistical details remain in the body of the paper. revision: yes

  2. Referee: Task and benchmark construction sections: the tasks are stated to address 'practical real-life use cases' yet the manuscript supplies no description of derivation from operator logs, expert annotation procedures, comparison to production telemetry, or external validation, which is load-bearing for claims that measured gains will translate to deployable improvements.

    Authors: We acknowledge the value of explicitly documenting the benchmark construction process. The tasks were designed to capture representative telecom scenarios drawn from publicly available industry documentation, standards, and common operational challenges. In the revision, we have added a dedicated subsection describing the task formulation process, including the use of domain-expert review for annotation guidelines and alignment with typical network management workflows. Direct use of proprietary operator logs or production telemetry was not feasible due to data access constraints; however, the added details clarify how the tasks reflect real-world use cases and support the observed performance gains. revision: partial

Circularity Check

0 steps flagged

No significant circularity in benchmark construction or performance reporting

full rationale

The paper introduces a new multimodal benchmark suite (MM-Telco) with text and image tasks for telecom use cases and reports empirical results from baseline LLMs/VLMs plus fine-tuned models showing performance gains on those tasks. No derivation chain, equations, or first-principles claims exist that reduce by construction to the paper's own inputs; the reported boosts are direct experimental measurements on the defined dataset rather than predictions or self-definitional results. This is a standard benchmark paper with self-contained empirical content and no load-bearing self-citations or ansatz smuggling that would trigger the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard machine-learning assumptions about fine-tuning effectiveness and benchmark validity rather than introducing new fitted parameters, axioms, or invented entities.

axioms (1)
  • domain assumption Fine-tuning general-purpose LLMs and VLMs on domain-specific datasets yields measurable performance gains on related tasks.
    Invoked when claiming the boost from fine-tuning on the new dataset.

pith-pipeline@v0.9.0 · 5526 in / 1181 out tokens · 45699 ms · 2026-05-17T22:03:13.998085+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The benchmark introduces various tasks (both text based and image based) that address various practical real-life use cases such as network operations, network management, improving documentation quality, and retrieval of relevant text and images. ... The models fine-tuned on our dataset exhibit a significant boost in performance.

  • IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We fine-tune a Llama model Llama-VL-Telco that is capable of generating and updating the telecom images when given the suitable prompt.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 8 internal anchors

  1. [1]

    3rd Generation Partnership Project (3GPP). 2024. 3GPP Specifications and Technologies - Releases. https://www.3gpp.org/specifications- Gagan Raj Gupta, Anshul Kumar, Manish Rai, Apu Chakraborty, Ashutosh Modi, Abdelaali Chaoub, Soumajit Pramanik, Moyank Giri, Yashwanth Holla, Sunny Kumar, and M. V. Kiran Sooraj Model Top 1 Accuracy Top 3 Accuracy Top 5 Ac...

  2. [2]

    Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv:2308.12966 [cs.CV] https://arxiv. org/abs/2308.12966

  3. [3]

    Lina Bariah, Hang Zou, Qiyang Zhao, Belkacem Mouhouche, Faouzi Bader, and Merouane Debbah. 2023. Understanding Telecom Language Through Large Language Models. arXiv:2306.07933 [cs.CL] https: //arxiv.org/abs/2306.07933

  4. [4]

    Jie Bian, Michael Welzl, Andrey Kutuzov, and Nikolay Arefyev. 2024. Tell Me Why: Language Models Help Explain the Rationale Behind Internet Protocol Design. In2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN). 447–

  5. [5]

    https://doi.org/10.1109/ICMLCN59089.2024.10624781

  6. [6]

    Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mah- moud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, ...

  7. [7]

    Andrei-Laurentiu Bornea, Fadhel Ayed, Antonio De Domenico, Nicola Piovesan, and Ali Maatouk. 2024. Telco-RAG: Navigating the Chal- lenges of Retrieval-Augmented Language Models for Telecommunica- tions. arXiv:2404.15939 [cs.IR] https://arxiv.org/abs/2404.15939

  8. [8]

    Common Crawl. 2024. Common Crawl Dataset. https://commoncrawl. org/

  9. [9]

    Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Jimenez Rezende, Yoshua Bengio, Michael C Mozer, and Sanjeev Arora. 2024. Metacognitive capabilities of llms: An exploration in mathematical problem solv- ing.Advances in Neural Information Processing Systems37 (2024), 19783–19812

  10. [10]

    Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, and Pierre Colombo. 2025. ColPali: Efficient Document Retrieval with Vision Language Models. arXiv:2407.01449 [cs.IR] https://arxiv.org/abs/2407.01449

  11. [11]

    Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Jidong Ge, and Vincent Ng. 2025. InternLM-Law: An Open-Sourced Chinese Legal Large Language Model. InProceedings of the 31st In- ternational Conference on Computational Linguistics, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eu- genio, and Steven Schockaert (Eds....

  12. [12]

    Iryna Hartsock and Ghulam Rasool. 2024. Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review.ArXivabs/2403.02469 (2024). https://doi.org/10.48550/arXiv. 2403.02469

  13. [13]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685 [cs.CL] https://arxiv.org/abs/2106.09685

  14. [14]

    Athanasios Karapantelakis, Mukesh Thakur, Alexandros Nikou, Farnaz Moradi, Christian Orlog, Fitsum Gaim, Henrik Holm, Doumitrou Daniil Nimara, and Vincent Huang. 2024. Us- ing Large Language Models to Understand Telecom Standards. arXiv:2404.02929 [cs.CL] https://arxiv.org/abs/2404.02929

  15. [15]

    Imtiaz Karim, Kazi Samin Mubasshir, Mirza Masfiqur Rahman, and Elisa Bertino. 2023. SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis. InFindings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings), Jong C. Park, Yuki Arase, Baotian Hu, Wei Lu, Derry Wijaya, Ayu Purwarianti, and Adila Alfa Krisnadhi (Eds.). Association ...

  16. [16]

    Kartik Kuckreja, M. S. Danish, Muzammal Naseer, Abhijit Das, Salman Khan, and F. Khan. 2023. GeoChat:Grounded Large Vision-Language Model for Remote Sensing.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2023), 27831–27840. https: //doi.org/10.1109/CVPR52733.2024.02629

  17. [17]

    Hilbert Yuen In Lam, Xing Er Ong, and Marek Mutwil. 2024. Large Language Models in Plant Biology. arXiv:2401.02789 [q-bio.GN] https://arxiv.org/abs/2401.02789

  18. [18]

    Sunwoo Lee, Dhammiko Arya, Seung-Mo Cho, Gyoung-eun Han, Seokyoung Hong, Wonbeom Jang, Seojin Lee, Sohee Park, Sereimony Sek, Injee Song, et al. 2024. TelBench: A Benchmark for Evaluating Telco-Specific Large Language Models. InProceedings of the 2024 Con- ference on Empirical Methods in Natural Language Processing: Industry Track. 609–626

  19. [19]

    Zheng Lin, Guanqiao Qu, Qiyuan Chen, Xianhao Chen, Zhe Chen, and Kaibin Huang. 2023. Pushing large language models to the 6g edge: Vision, challenges, and opportunities.arXiv preprint arXiv:2309.16739 (2023)

  20. [20]

    Gili Lior, Avi Caciularu, Arie Cattan, Shahar Levy, Ori Shapira, and Gabriel Stanovsky. 2024. SEAM: A Stochastic Benchmark for Multi- Document Tasks.ArXivabs/2406.16086 (2024). https://doi.org/10. 48550/arXiv.2406.16086

  21. [21]

    Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, and Dong- mei Zhang. 2023. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct.arXiv preprint arXiv:2308.09583(2023)

  22. [22]

    Risto Luukkonen, Ville Komulainen, Jouni Luoma, Anni Eskelinen, Jenna Kanerva, Hanna-Mari Kupari, Filip Ginter, Veronika Laippala, Niklas Muennighoff, Aleksandra Piktus, Thomas Wang, Nouamane Tazi, Teven Scao, Thomas Wolf, Osma Suominen, Samuli Sairanen, Mikko Merioksa, Jyrki Heinonen, Aija Vahtola, Samuel Antao, and Sampo Pyysalo. 2023. FinGPT: Large Gen...

  23. [23]

    Ali Maatouk, Kenny Chirino Ampudia, Rex Ying, and Leandros Tas- siulas. 2024. Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications. arXiv:2409.05314 [cs.IT] https: //arxiv.org/abs/2409.05314

  24. [24]

    Ali Maatouk, Fadhel Ayed, Nicola Piovesan, Antonio De Domenico, Merouane Debbah, and Zhi-Quan Luo. 2023. TeleQnA: A Bench- mark Dataset to Assess Large Language Models Telecommunications Knowledge. arXiv:2310.15051 [cs.IT] https://arxiv.org/abs/2310.15051

  25. [25]

    Ali Maatouk, Nicola Piovesan, Fadhel Ayed, Antonio De Domenico, and Merouane Debbah. 2024. Large Language Models for Telecom: Forthcoming Impact on the Industry. arXiv:2308.06013 [cs.IT] https: //arxiv.org/abs/2308.06013

  26. [26]

    Rasoul Nikbakht, Mohamed Benzaghta, and Giovanni Geraci. 2024. TSpec-LLM: An Open-source Dataset for LLM Understanding of 3GPP Gagan Raj Gupta, Anshul Kumar, Manish Rai, Apu Chakraborty, Ashutosh Modi, Abdelaali Chaoub, Soumajit Pramanik, Moyank Giri, Yashwanth Holla, Sunny Kumar, and M. V. Kiran Sooraj Specifications. arXiv:2406.01768 [cs.NI] https://arx...

  27. [27]

    OpenWebText. 2024. OpenWebText Corpus. https://skylion007.github. io/OpenWebTextCorpus/

  28. [28]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learn- ing Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV] https://arxiv.org/abs/2103.00020

  29. [29]

    Mirza Masfiqur Rahman, Imtiaz Karim, and Elisa Bertino. 2024. Cel- lularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications. In33rd USENIX Security Sympo- sium (USENIX Security 24). USENIX Association, Philadelphia, PA, 5215–5232. https://www.usenix.org/conference/usenixsecurity24/ presentation/rahman

  30. [30]

    Sujoy Roychowdhury, Sumit Soman, H. G. Ranjani, Vansh Chhabra, Neeraj Gunda, Shashank Gautam, Subhadip Bandyopadhyay, and Sai Krishna Bala. 2024. Towards Understanding Domain Adapted Sen- tence Embeddings for Document Retrieval. arXiv:2406.12336 [cs.CL] https://arxiv.org/abs/2406.12336

  31. [31]

    Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950(2023)

  32. [32]

    Azzedine Idir Ait Said, Abdelkader Mekrache, Karim Boutiba, Kostas Ramantas, Adlen Ksentini, and Moufida Rahmani. 2024. 5G INSTRUCT Forge: An Advanced Data Engineering Pipeline for Making LLMs Learn 5G.IEEE Transactions on Cognitive Communications and Net- working(2024), 1–1. https://doi.org/10.1109/TCCN.2024.3516055

  33. [33]

    Ankit Satpute, Noah Gießing, André Greiner-Petter, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, and Bela Gipp. 2024. Can llms master math? investigating large language models on math stack exchange. In Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval. 2316–2320

  34. [34]

    Jiawei Shao, Jingwen Tong, Qiong Wu, Wei Guo, Zijian Li, Zehong Lin, and Jun Zhang. 2024. WirelessLLM: Empowering Large Language Models Towards Wireless Intelligence. arXiv:2405.17053 [cs.NI] https: //arxiv.org/abs/2405.17053

  35. [35]

    Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole- Lewis, Stephen Pfohl, et al . 2023. Large language models encode clinical knowledge.Nature620, 7972 (2023), 172–180

  36. [36]

    Sumit Soman and Ranjani H. G. 2023. Observations on LLMs for Telecom Domain: Capabilities and Limitations. InThe Third Interna- tional Conference on Artificial Intelligence and Machine Learning Sys- tems (AIMLSystems 2023). ACM, 1–5. https://doi.org/10.1145/3639856. 3639892

  37. [37]

    Marina, and Bozidar Radunovic

    Chuanhao Sun, Ujjwal Pawar, Molham Khoja, Xenofon Foukas, Ma- hesh K. Marina, and Bozidar Radunovic. 2024. SpotLight: Accurate, Explainable and Efficient Anomaly Detection for Open RAN. InPro- ceedings of the 30th Annual International Conference on Mobile Comput- ing and Networking(Washington D.C., DC, USA)(ACM MobiCom ’24). Association for Computing Mach...

  38. [38]

    Changjie Wang, Mariano Scazzariello, Alireza Farshin, Dejan Kostic, and Marco Chiesa. 2023. Making Network Configuration Human Friendly. arXiv:2309.06342 [cs.NI] https://arxiv.org/abs/2309.06342

  39. [39]

    Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. 2024. AutoDroid: LLM-powered Task Automation in Android. InProceed- ings of the 30th Annual International Conference on Mobile Computing and Networking(Washington D.C., DC, USA)(ACM MobiCom ’24). Association for Computing Machine...

  40. [40]

    Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya Zhang, Weidi Xie, and Yanfeng Wang. 2024. PMC-LLaMA: toward building open-source lan- guage models for medicine.Journal of the American Medical Informat- ics Association : JAMIA(2024). https://doi.org/10.1093/jamia/ocae045

  41. [41]

    Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. 2023. BloombergGPT: A Large Language Model for Finance. arXiv:2303.17564 [cs.LG] https://arxiv.org/abs/2303.17564

  42. [42]

    Fangyuan Xu, Yixiao Song, Mohit Iyyer, and Eunsol Choi. 2023. A Critical Evaluation of Evaluations for Long-form Question Answering. arXiv:2305.18201 [cs.CL] https://arxiv.org/abs/2305.18201

  43. [43]

    Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, and Xia Hu. 2023. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond.ACM Transactions on Knowledge Discovery from Data18 (2023), 1 – 32. https: //api.semanticscholar.org/CorpusID:258331833

  44. [44]

    Shunyu Yao, Qingqing Ke, Qiwei Wang, Kangtong Li, and Jie Hu

  45. [45]

    InProceedings of the 2024 3rd International Symposium on Robotics, Artificial Intelligence and Information Engineering

    Lawyer GPT: A legal large language model with enhanced domain knowledge and reasoning capabilities. InProceedings of the 2024 3rd International Symposium on Robotics, Artificial Intelligence and Information Engineering. 108–112

  46. [46]

    Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, and Yue Zhang. 2024. A survey on large language model (LLM) security and pri- vacy: The Good, The Bad, and The Ugly.High-Confidence Computing 4, 2 (2024), 100211. https://doi.org/10.1016/j.hcc.2024.100211

  47. [47]

    Sangwon Yu, Ik hwan Kim, Jongyoon Song, Saehyung Lee, Junsung Park, and Sungroh Yoon. 2024. Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context. arXiv:2410.07103 [cs.CL] https://arxiv.org/abs/2410.07103

  48. [48]

    Dong Yuan, Eti Rastogi, Gautam Naik, Sree Prasanna Rajagopal, Sagar Goyal, Fen Zhao, Bharath Chintagunta, and Jeffrey Ward. 2024. A con- tinued pretrained llm approach for automatic medical note generation. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volu...

  49. [49]

    Jingyi Zhang, Jiaxing Huang, Sheng Jin, and Shijian Lu. 2023. Vision- Language Models for Vision Tasks: A Survey.IEEE Transactions on Pattern Analysis and Machine Intelligence46 (2023), 5625–5644. https: //doi.org/10.1109/TPAMI.2024.3369699

  50. [50]

    Xingcheng Zhou, Mingyu Liu, Ekim Yurtsever, Bare Luka Žagar, Wal- ter Zimmer, Hu Cao, and Alois C. Knoll. 2023. Vision Language Models in Autonomous Driving: A Survey and Outlook.IEEE Transactions on Intelligent Vehicles(2023). https://api.semanticscholar.org/CorpusID: 269865211

  51. [51]

    Hang Zou, Qiyang Zhao, Yu Tian, Lina Bariah, Faouzi Bader, Thierry Lestable, and Merouane Debbah. 2024. TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models. arXiv:2407.09424 [eess.SP] https://arxiv.org/abs/2407.09424 MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications 9 Appendix In this subsection, we pres...