arxiv: 2512.12400 · v2 · submitted 2025-12-13 · 💻 cs.NI

Recognition: no theorem link

Agentic AI for 6G: A New Paradigm for Autonomous RAN Security Compliance

Sotiris Chatzimiltis , Mahdi Boloursaz Mashhadi , Mohammad Shojafar , Merouane Debbah , Rahim Tafazolli

Authors on Pith no claims yet

Pith reviewed 2026-05-16 22:40 UTC · model grok-4.3

classification 💻 cs.NI

keywords Agentic AI6G RANSecurity ComplianceLLM AgentsRAG PipelineO-RAN StandardsAutonomous EnforcementTelecom Security

0 comments

The pith

LLM-based AI agents integrated with RAG pipelines can autonomously enforce security compliance in 6G radio access networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a framework that uses LLM-powered AI agents combined with retrieval-augmented generation to automate the process of checking and enforcing security compliance in next-generation RANs. Traditional manual methods struggle with the rapid evolution of standards like those from O-RAN and 3GPP, creating opportunities for intelligent automation. Through a case study, the framework demonstrates how an agent can review configuration files, produce explanations, and recommend fixes for compliance issues. If effective, such systems could transform how security is maintained in complex 6G environments by making compliance enforcement faster and more adaptive.

Core claim

The paper establishes that LLM-based AI agents with RAG integration enable intelligent and autonomous enforcement of security compliance by assessing configuration files against O-RAN Alliance and 3GPP standards, generating explainable justifications, and proposing automated remediation where necessary.

What carries the argument

An LLM-based AI agent integrated with a retrieval-augmented generation (RAG) pipeline, which retrieves relevant standard information to support accurate compliance decisions and explanations.

If this is right

Agents can automatically assess network configuration files for compliance with evolving telecom standards.
Systems generate explainable justifications for compliance assessments.
Automated remediation suggestions can be produced for identified non-compliance issues.
The framework addresses challenges such as model hallucinations and vendor inconsistencies in standards interpretation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such agentic systems could extend to real-time monitoring and dynamic adjustment of network parameters beyond static configuration checks.
Development of telecom-specific LLMs would likely improve reliability over general-purpose models for this domain.
Standardized benchmarks for evaluating these agents would be needed to compare performance across different implementations.

Load-bearing premise

Current general-purpose large language models can interpret complex and evolving telecommunications standards accurately enough to make reliable compliance decisions without excessive hallucinations.

What would settle it

Running the proposed agent on a set of deliberately misconfigured files with known compliance status and measuring whether it correctly identifies violations and provides accurate, non-hallucinated justifications.

Figures

Figures reproduced from arXiv: 2512.12400 by Mahdi Boloursaz Mashhadi, Merouane Debbah, Mohammad Shojafar, Rahim Tafazolli, Sotiris Chatzimiltis.

**Figure 2.** Figure 2: Proposed framework for intelligent security compliance in next-generation RANs. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Case study workflow of static compliance implemented in N8N. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Comparative analysis of case study under different retrieval configurations. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Agentic AI systems are emerging as powerful tools for automating complex, multi-step tasks across various industries. One such industry is telecommunications, where the growing complexity of next-generation radio access networks (RANs) opens up numerous opportunities for applying these systems. Securing the RAN is a key area, particularly through automating the security compliance process, as traditional methods often struggle to keep pace with evolving specifications and real-time changes. In this article, we propose a framework that leverages LLM-based AI agents integrated with a retrieval-augmented generation (RAG) pipeline to enable intelligent and autonomous enforcement of security compliance. An initial case study demonstrates how an agent can assess configuration files for compliance with O-RAN Alliance and 3GPP standards, generate explainable justifications, and propose automated remediation if needed. We also highlight key challenges such as model hallucinations and vendor inconsistencies, along with considerations like agent security, transparency, and system trust. Finally, we outline future directions, emphasizing the need for telecom-specific LLMs and standardized evaluation frameworks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a high-level framework proposal for LLM agents plus RAG to automate RAN security compliance checks, but it rests on an untested assumption that current models can handle the standards reliably.

read the letter

The paper's main contribution is a conceptual framework that uses LLM-based agents with a RAG pipeline to read RAN configuration files, check them against O-RAN and 3GPP rules, generate explanations, and suggest fixes. The authors map the growing complexity of 6G security compliance and show one worked example of the agent in action. They also list practical obstacles such as hallucinations, vendor inconsistencies, and the need for transparency. That framing is clear and directly relevant to people building network automation tools. The work applies standard agentic techniques to a new domain without claiming a novel algorithm, which keeps expectations realistic. The case study gives a concrete sense of the intended workflow. The central weakness is the missing validation. The paper acknowledges hallucinations as a key risk yet provides no error rates, no expert baseline comparisons, and no tests on held-out standards documents. Without those numbers the claim that the system can produce accurate, autonomous decisions stays unproven. The framework itself does not introduce circular fitting or invented parameters; it points to external specifications as ground truth. This paper is aimed at researchers in AI for wireless networks who want to see early directions for agent-based compliance. Readers looking for implemented systems or benchmark results will find the description useful as a starting point but will need follow-up work. It deserves peer review because the problem is timely for 6G and the framework is described coherently enough that referees can ask for the quantitative experiments that are currently absent.

Referee Report

3 major / 2 minor

Summary. The paper proposes a framework for autonomous RAN security compliance in 6G networks that integrates LLM-based AI agents with a retrieval-augmented generation (RAG) pipeline. The framework is intended to interpret evolving O-RAN Alliance and 3GPP standards, assess configuration files for compliance, generate explainable justifications, and suggest automated remediation. An initial qualitative case study is presented to illustrate the approach, alongside discussion of challenges including model hallucinations, vendor inconsistencies, agent security, and the need for telecom-specific LLMs and standardized evaluation frameworks.

Significance. If the reliability assumptions hold and quantitative validation is added, the work could contribute to automating complex compliance tasks in next-generation networks, potentially reducing manual effort in a domain where standards evolve rapidly. The conceptual integration of agentic AI with RAG for explainable decisions aligns with emerging trends in network automation, though the absence of metrics limits immediate impact.

major comments (3)

[Case study] Case study section: The evaluation consists of a single qualitative example of configuration assessment with no reported quantitative metrics such as accuracy, hallucination rate, inter-annotator agreement, or comparison against expert baselines on held-out 3GPP/O-RAN artifacts. This directly undermines the central claim that the agents enable reliable autonomous enforcement.
[Framework] Framework description (likely §3): The claim that LLM agents with RAG can produce accurate compliance decisions rests on the unquantified assumption that general-purpose LLMs can interpret complex, evolving standards without unacceptable hallucination rates, yet the manuscript lists hallucinations as a key challenge without presenting mitigation evidence or error analysis.
[Challenges and future directions] Challenges and future directions section: The discussion of hallucinations and the call for telecom-specific LLMs and standardized evaluation frameworks is appropriate but remains high-level; no concrete test protocol, dataset, or baseline comparison is defined that would allow falsification of the reliability assumption.

minor comments (2)

[Abstract/Introduction] The abstract and introduction reference 'vendor inconsistencies' as a challenge but provide no further elaboration or examples in the main text.
[Framework] Notation for agent components (e.g., how the RAG pipeline interfaces with the decision agent) could be clarified with a diagram or pseudocode to improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects of evaluation rigor that we will address in the revision. Below we respond point by point to the major comments.

read point-by-point responses

Referee: [Case study] Case study section: The evaluation consists of a single qualitative example of configuration assessment with no reported quantitative metrics such as accuracy, hallucination rate, inter-annotator agreement, or comparison against expert baselines on held-out 3GPP/O-RAN artifacts. This directly undermines the central claim that the agents enable reliable autonomous enforcement.

Authors: We agree that the presented case study is limited to a single qualitative illustration. The manuscript introduces a new conceptual framework and uses the example primarily to demonstrate workflow rather than to claim empirical reliability. In the revised manuscript we will expand the case study section to include at least two additional configuration scenarios drawn from public O-RAN specifications and will report basic quantitative indicators obtained via expert review, such as compliance detection accuracy and justification completeness scores. revision: yes
Referee: [Framework] Framework description (likely §3): The claim that LLM agents with RAG can produce accurate compliance decisions rests on the unquantified assumption that general-purpose LLMs can interpret complex, evolving standards without unacceptable hallucination rates, yet the manuscript lists hallucinations as a key challenge without presenting mitigation evidence or error analysis.

Authors: The manuscript does not claim that general-purpose LLMs currently deliver acceptable accuracy; it presents the agentic RAG approach as a research direction while explicitly identifying hallucinations as an open challenge. To clarify this distinction we will revise the framework description to include a dedicated subsection on mitigation strategies (e.g., retrieval verification, multi-agent debate, and human-in-the-loop checkpoints) supported by citations to recent literature on LLM reliability. revision: partial
Referee: [Challenges and future directions] Challenges and future directions section: The discussion of hallucinations and the call for telecom-specific LLMs and standardized evaluation frameworks is appropriate but remains high-level; no concrete test protocol, dataset, or baseline comparison is defined that would allow falsification of the reliability assumption.

Authors: We concur that the future-directions discussion is high-level. In the revision we will add a concrete evaluation protocol subsection that specifies (1) a dataset structure based on publicly available 3GPP and O-RAN configuration artifacts, (2) metrics including hallucination rate measured against expert annotations, and (3) baseline comparisons using both general-purpose and domain-adapted models. This will enable future falsification of the reliability claims. revision: yes

Circularity Check

0 steps flagged

No circularity in conceptual framework proposal

full rationale

The paper proposes a high-level framework for LLM-based agents with RAG to enforce RAN security compliance against O-RAN and 3GPP standards. It contains no mathematical derivation chain, fitted parameters, predictions, or equations that reduce to inputs by construction. The single qualitative case study is illustrative rather than quantitative, and external standards function as independent inputs. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear in any derivation; the central claim is a system architecture proposal, not a result derived from prior fitted values or self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The proposal rests on the unproven assumption that general LLMs can serve as reliable interpreters of telecom standards; no free parameters are fitted and no new physical entities are introduced.

axioms (1)

domain assumption LLM-based agents can accurately assess compliance with O-RAN and 3GPP security specifications
Invoked throughout the framework description and case study without supporting evidence or error bounds.

invented entities (1)

Autonomous RAN security compliance agent no independent evidence
purpose: To perform assessment, justification, and remediation
Conceptual combination of LLM agents and RAG presented as a new system; no independent falsifiable prediction or external validation is supplied.

pith-pipeline@v0.9.0 · 5502 in / 1240 out tokens · 36976 ms · 2026-05-16T22:40:11.375014+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Bridging the Cognitive Gap: A Unified Memory Paradigm for 6G Agentic AI-RAN
cs.NI 2026-05 unverdicted novelty 4.0

A memory-centric architecture is envisioned for 6G networks to create a cognitive continuum where AI agents access multi-timescale state via zero-copy observability instead of message passing.
Agentic AI-Based Joint Computing and Networking via Mixture of Experts and Large Language Models
cs.LG 2026-04 unverdicted novelty 4.0

An agentic framework uses LLMs to orchestrate MoE optimization experts for throughput, fairness, and delay objectives in joint computing and networking, achieving near-optimal simulation performance.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · cited by 2 Pith papers

[1]

Large Language Models for Telecom: Forthcoming Impact on the Industry,

A. Maatouk, N. Piovesan, F. Ayed, A. De Domenico, and M. Debbah, “Large Language Models for Telecom: Forthcoming Impact on the Industry,” IEEE Communications Magazine , vol. 63, no. 1, pp. 62–68, 2025

work page 2025
[2]

Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities,

H. Zhou, C. Hu, Y . Y uan, Y . Cui, Y . Jin, C. Chen, H. Wu, D. Y uan, L. Jiang, D. Wu, X. Liu, C. Zhang, X. Wang, and J. Liu, “Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities,” IEEE Com- munications Surveys & Tutorials , 2024

work page 2024
[3]

LLMs’ Suitability for Network Security: A Case Study of STRIDE Threat Modeling,

A. AbdulGhaffar and A. Matrawy, “LLMs’ Suitability for Network Security: A Case Study of STRIDE Threat Modeling,” arXiv preprint arXiv:2505.04101, 2025

work page arXiv 2025
[4]

Large Language Models in 6G Security: Challenges and Opportunities,

T. Nguyen, H. Nguyen, A. Ijaz, S. Sheikhi, A. V . V asilakos, and P . Kostakos, “Large Language Models in 6G Security: Challenges and Opportunities,” arXiv preprint arXiv:2403.12239 , 2024

work page arXiv 2024
[5]

AI-on- RAN for Cyber Defense: An XAI-LLM Framework for Interpretable Anomaly Detection,

S. Chatzimiltis, M. Shojafar, M. B. Mashhadi, and R. Tafazolli, “AI-on- RAN for Cyber Defense: An XAI-LLM Framework for Interpretable Anomaly Detection,” IEEE Transactions on Network Science and Engi- neering, pp. 1–20, 2025

work page 2025
[6]

Advanced Architectures Integrated with Agentic AI for Next-Generation Wireless Networks,

K. Dev, S. A. Khowaja, K. Singh, E. Zeydan, and M. Debbah, “Advanced Architectures Integrated with Agentic AI for Next-Generation Wireless Networks,” arXiv preprint arXiv:2502.01089 , 2025

work page arXiv 2025
[7]

MobiLLM: An Agentic AI Framework for Closed-Loop Threat Miti- gation in 6G Open RANs,

P . Sharma, H. Wen, V . Y egneswaran, A. Gehani, P . Porras, and Z. Lin, “MobiLLM: An Agentic AI Framework for Closed-Loop Threat Miti- gation in 6G Open RANs,” arXiv preprint arXiv:2509.21634 , 2025

work page arXiv 2025
[8]

Toward standardization of GenAI-driven agentic architectures for radio access networks,

Z. Nezami, S. A. R. Zaidi, M. Hafeez, J. Xu, and K. Djemame, “Toward standardization of GenAI-driven agentic architectures for radio access networks,” Frontiers in Artiﬁcial Intelligence , vol. 8, 2025

work page 2025
[9]

Agentran: An agentic ai architecture for autonomous control of open 6g networks,

M. Elkael, S. D’Oro, L. Bonati, M. Polese, Y . Lee, K. Furueda, and T. Melodia, “AgentRAN: An Agentic AI Architecture for Autonomous Control of Open 6G Networks,” arXiv preprint arXiv:2508.17778 , 2025

work page arXiv 2025
[10]

Edge agentic ai framework for autonomous network optimisation in o- ran,

A. Salama, Z. Nezami, M. M. Qazzaz, M. Hafeez, and S. A. R. Zaidi, “Edge Agentic AI Framework for Autonomous Network Optimisation in O-RAN,” arXiv preprint arXiv:2507.21696 , 2025

work page arXiv 2025
[11]

LLM-Driven Agentic AI Ap- proach to Enhanced O-RAN Resilience in Next-Generation Networks,

X. Wu, Y . Wang, J. Farooq, and J. Chen, “LLM-Driven Agentic AI Ap- proach to Enhanced O-RAN Resilience in Next-Generation Networks,” Authorea Preprints, 2025

work page 2025
[12]

MX-AI: Agentic Observ- ability and Control Platform for Open and AI-RAN,

I. Chatzistefanidis, A. Leone, A. Y aghoubian, M. Irazabal, S. Nassim, L. Bariah, M. Debbah, and N. Nikaein, “MX-AI: Agentic Observ- ability and Control Platform for Open and AI-RAN,” arXiv preprint arXiv:2508.09197, 2025

work page arXiv 2025
[13]

TR.GenAI-Telecom: Potential Requirements and Methodology for Deploying and Assessing Generative AI Models in Telecom Net- works,

“TR.GenAI-Telecom: Potential Requirements and Methodology for Deploying and Assessing Generative AI Models in Telecom Net- works,” International Telecommunication Union, ITU-T, Technical Re- port TR.GenAI-Telecom, March 2025

work page 2025
[14]

Research Report on Generative AI Use Cases and Requirements on 6G Network,

“Research Report on Generative AI Use Cases and Requirements on 6G Network,” O-RAN ALLIANCE, Next Generation Research Group (nGRG), O-RAN nGRG Research Report RR-2025-02, 2025

work page 2025
[15]

Position Paper: Leveraging Large Language Models for Cybersecurity Compliance,

A. Salman, S. Creese, and M. Goldsmith, “Position Paper: Leveraging Large Language Models for Cybersecurity Compliance,” in 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) , 2024, pp. 496–503. Supplementary Material 1 Agent Systems Prompts 1.1 Compliance Assessment Agent (System Prompt) You are the Compliance Assessment Agent. Yo...

work page 2024
[16]

Detection phase: Carefully scan the entire configuration and list ALL security-relevant fields and their values

work page
[17]

RAG Query Generator

RAG Query Generation (one-time step): - Before evaluating any field, you MUST call the "RAG Query Generator" exactly once. - The output will be one or more standards-based query sentences. - You MUST store all of these sentences for subsequent Knowledge Base calls

work page
[18]

Knowledge Base

Standards Lookup Phase: - For each query sentence produced by the RAG Query Generator, you MUST call the "Knowledge Base" tool once using that sentence as the query. - You MUST use the retrieved content including its Filename metadata to understand the applicable O-RAN, 3GPP, and ETSI requirements. - You MUST NOT guess or rely solely on internal knowledge...

work page
[19]

- Build a list of ALL violations you found

Compliance Assessment Phase: - Using all retrieved specification passages (and their Filename metadata), evaluate each security-relevant field in the configuration for compliance against O-RAN, 3GPP, and ETSI. - Build a list of ALL violations you found. Do NOT skip any

work page
[20]

- Explain the security risk

Fixing phase: - For EACH violation: 1 - Cite the exact standard text you relied on AND the associated Filename metadata. - Explain the security risk. - Apply the smallest possible fix inside the existing structure

work page
[21]

- If any listed violation is still present, you MUST update the corrected config before responding

Verification phase: - Re-scan the corrected configuration to ensure every listed violation is now fixed. - If any listed violation is still present, you MUST update the corrected config before responding

work page
[22]

Non-Compliant

If any fix was applied -> Compliance Status MUST be "Non-Compliant"

work page
[23]

Compliant

Only when zero fixes are needed -> Compliance Status is "Compliant". REVISION: - If Previous Reflection Feedback exists, you MUST address every item first. OUTPUT (ALWAYS): - Compliance Status: Compliant | Non-Compliant - Violations Found - Specification References (include Filename metadata) - Recommended Code Modifications - Security Impact Analysis - O...

work page