arxiv: 2605.10648 · v2 · submitted 2026-05-11 · 💻 cs.NI · cs.SY· eess.SY

Recognition: 2 theorem links

· Lean Theorem

Demystifying Deep Reinforcement Learning: A Neuro-Symbolic Framework for Interpretable Open RAN Automation

Jie Lu , Peihao Yan , Pang-Ning Tan , Y. Thomas Hou , Huacheng Zeng

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:03 UTC · model grok-4.3

classification 💻 cs.NI cs.SYeess.SY

keywords neuro-symbolic learningdeep reinforcement learningopen RANinterpretabilitynetwork automationsymbolic regressionO-RAN testbed

0 comments

The pith

DeRAN distills black-box deep reinforcement learning policies into human-readable symbolic rules for open RAN tasks while retaining most of the performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeRAN to address the opacity of deep reinforcement learning in open radio access networks. It uses a concept-driven layer to simplify network telemetry into semantic features, then applies deep symbolic regression and neurally guided logic to create executable symbolic policies. This produces policies that operators can read and audit while retaining most of the original performance. The result matters because carrier networks require transparency for safe and compliant operation, which standard DRL lacks. Evaluation on a live 5G testbed shows it works for network slicing and mobility management.

Core claim

DeRAN bridges DRL performance and transparency by first abstracting high-dimensional telemetry into semantically meaningful concepts, then synthesizing symbolic policies via deep symbolic regression for continuous control and neurally guided differentiable logic for discrete decisions, yielding policies that achieve 78% and 87% of DRL cumulative rewards in the evaluated use cases on a live 5G O-RAN testbed.

What carries the argument

The concept-driven abstraction layer combined with deep symbolic regression (DSR) and neurally guided differentiable logic (NUDGE) that together distill black-box DRL policies into human-readable symbolic representations.

If this is right

Network operators gain direct inspectability and modifiability of control policies before deployment in carrier networks.
The framework provides built-in auditability that meets requirements for safe operation in open RAN environments.
Performance retention at 78% and 87% of DRL levels indicates that interpretability constraints impose only moderate overhead in the tested tasks.
The method supports practical implementation on existing 5G O-RAN hardware for tasks such as slicing and mobility management.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The abstraction approach could extend to other critical infrastructure domains where opaque AI must be replaced by auditable rules.
Hybrid systems might emerge in which symbolic policies handle safety constraints while DRL handles optimization subtasks.
Operators could integrate the resulting symbolic rules into legacy rule-based network management platforms with minimal adaptation.

Load-bearing premise

The concept-driven abstraction layer transforms high-dimensional network telemetry into a compact set of semantically meaningful features without losing information critical to policy performance.

What would settle it

If the symbolic policies extracted by DeRAN achieve below 60% of DRL cumulative rewards when re-evaluated on the same live 5G testbed scenarios or produce rules that cannot be directly audited by operators, the central performance and interpretability claims would be falsified.

Figures

Figures reproduced from arXiv: 2605.10648 by Huacheng Zeng, Jie Lu, Pang-Ning Tan, Peihao Yan, Y. Thomas Hou.

**Figure 2.** Figure 2: Overview of DeRAN. where d(·,·) is a distribution divergence between the teacher and student over the current buffer window N = |D|. The optimization is performed in two stages, i.e., a spec-grounded conceptizer followed by per-dimension symbolic distillers, which are detailed in §4.1. The student symbolic policy πφ is designed to satisfy the following requirements. (i) Deterministic execution. During real… view at source ↗

**Figure 3.** Figure 3: The logic flow of the proposed action distillation. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: A multi-cell indoor 5G NR O-RAN testbed. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Reward vs. symbolic complexity. zeMBB = log(1+c0) +1.32c3 +0.62c0c3 −1.86c2 −0.45c1, zURLLC = 1.74c1 +1.26c 2 1 +0.62c1c2 −0.74c3, zmMTC = 1 1.42+c0 +c1 −0.36log(1+c2) +0.23c3 [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 8.** Figure 8: Control performance comparison on the resource [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 10.** Figure 10: Reward vs. symbolic complexity. if c5 −c4 > 0.12 and c7 < 0.47 then switch else if c4 < 0.32 and c5 > 0.58 then switch else if c8 > 0.50 and c5 > 0.63 and c7 < 0.42 then switch else stay end if [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: An instance of the distilled rules for handover task. [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

read the original abstract

Open Radio Access Networks (O-RAN) are increasingly adopting data-driven control through Deep Reinforcement Learning (DRL) to optimize complex tasks such as network slicing and mobility management. However, the deployment of DRL in carrier-grade networks is hindered by its inherent opacity and stochastic execution, which limit operator trust, auditability, and safe deployment. Existing explainable AI (XAI) approaches primarily provide post-hoc insights and fail to produce executable, interpretable policies suitable for operational environments. In this paper, we present DeRAN, a neuro-symbolic framework that bridges the gap between DRL performance and operational transparency by distilling black-box DRL policies into human-readable symbolic representations. DeRAN introduces a concept-driven abstraction layer that transforms high-dimensional network telemetry into a compact set of semantically meaningful features, enabling interpretable policy learning. Building on the semantically grounded concepts, DeRAN synthesizes symbolic policies using deep symbolic regression (DSR) for continuous control and neurally guided differentiable logic (NUDGE) for discrete decision-making. We implement DeRAN on a live 5G O-RAN testbed and evaluate it on two representative use cases. Experimental results demonstrate that DeRAN achieves 78% and 87% of DRL's cumulative rewards in the two use cases, while offering interpretability and auditability by design. Source code is available at https://github.com/Jadejavu/DeRAN

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces DeRAN, a neuro-symbolic framework for interpretable Open RAN automation. It proposes a concept-driven abstraction layer to convert high-dimensional network telemetry into compact semantic features, followed by deep symbolic regression (DSR) for continuous control and neurally guided differentiable logic (NUDGE) for discrete decisions to distill black-box DRL policies into human-readable symbolic policies. The framework is implemented and evaluated on a live 5G O-RAN testbed for two use cases (network slicing and mobility management), claiming to achieve 78% and 87% of DRL cumulative rewards while providing built-in interpretability and auditability. Source code is released.

Significance. If the performance claims hold under rigorous validation, this could meaningfully advance practical deployment of data-driven control in O-RAN by addressing opacity concerns that currently limit operator trust. The explicit focus on executable symbolic policies (rather than post-hoc explanations) and the open-source release are positive contributions to reproducibility in the field.

major comments (1)

[Concept-driven abstraction layer and experimental evaluation sections] The central performance claims (78% and 87% of DRL cumulative rewards) rest on the concept-driven abstraction layer preserving all policy-relevant information. No ablation study is presented that compares DRL performance when trained on raw telemetry versus the same abstracted feature set; without this, it is impossible to isolate whether the observed reward gap originates in the DSR/NUDGE synthesis or in irreversible compression at the abstraction step. This directly affects the validity of attributing the results to the neuro-symbolic pipeline.

minor comments (1)

[Abstract] The abstract states performance percentages without any mention of experimental setup, baselines, statistical significance, or number of runs; this should be expanded even in the abstract for a methods paper.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The major comment raises a valid point about experimental rigor that we will address directly in revision.

read point-by-point responses

Referee: The central performance claims (78% and 87% of DRL cumulative rewards) rest on the concept-driven abstraction layer preserving all policy-relevant information. No ablation study is presented that compares DRL performance when trained on raw telemetry versus the same abstracted feature set; without this, it is impossible to isolate whether the observed reward gap originates in the DSR/NUDGE synthesis or in irreversible compression at the abstraction step. This directly affects the validity of attributing the results to the neuro-symbolic pipeline.

Authors: We agree that an ablation study isolating the abstraction layer's contribution is necessary to strengthen the attribution of results. The concept-driven abstraction is designed using O-RAN domain knowledge to map raw telemetry to a compact, semantically complete feature set without irreversible loss of policy-relevant information. However, to rigorously demonstrate this, the revised manuscript will include a new ablation experiment comparing DRL performance when trained on raw high-dimensional telemetry versus the abstracted features. This will quantify any reward degradation attributable solely to the abstraction step and clarify the source of the observed gap relative to the DSR/NUDGE synthesis. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on independent testbed evaluation of neuro-symbolic distillation

full rationale

The paper presents DeRAN as a framework that applies a concept-driven abstraction to telemetry, then uses DSR and NUDGE to synthesize symbolic policies from DRL. The 78% and 87% reward figures are reported as outcomes of live 5G O-RAN testbed runs rather than quantities derived by construction from fitted parameters or prior self-referential definitions. No equations, uniqueness theorems, or ansatzes in the provided text reduce the performance claims to the inputs by definition, and the cited methods (DSR, NUDGE) are treated as external building blocks. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on existing DSR and NUDGE techniques whose internal assumptions are not detailed here.

pith-pipeline@v0.9.0 · 5575 in / 1003 out tokens · 63444 ms · 2026-05-13T07:03:59.707358+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DeRAN introduces a concept-driven abstraction layer that transforms high-dimensional network telemetry into a compact set of semantically meaningful features, enabling interpretable policy learning. Building on the semantically grounded concepts, DeRAN synthesizes symbolic policies using deep symbolic regression (DSR) for continuous control and neurally guided differentiable logic (NUDGE) for discrete decision-making.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We implement DeRAN on a live 5G O-RAN testbed and evaluate it on two representative use cases. Experimental results demonstrate that DeRAN achieves 78% and 87% of DRL's cumulative rewards

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Management and orchestration; 5G performance measurements

3GPP. Management and orchestration; 5G performance measurements. Technical Specification (TS) TS 28.552, 3rd Generation Partnership Project (3GPP), 2024. Ver- sion 19.1.0, Release 19

work page 2024
[2]

NR; Radio Resource Control (RRC); Protocol specification

3GPP. NR; Radio Resource Control (RRC); Protocol specification. Technical Specification (TS) TS 38.331, 3rd Generation Partnership Project (3GPP), 2025. Ver- sion 18.5.1, Release 18

work page 2025
[3]

Oranslice: An open source 5g network slicing platform for o-ran

Hai Cheng, Salvatore D’Oro, Rajeev Gangula, Sakthivel Velumani, Davide Villa, Leonardo Bonati, Michele Polese, Tommaso Melodia, Gabriel Arrobo, and Chris- tian Maciocco. Oranslice: An open source 5g network slicing platform for o-ran. InProceedings of the 30th Annual International Conference on Mobile Computing and Networking, pages 2297–2302, 2024

work page 2024
[4]

Interpretable and explainable logi- cal policies via neurally guided symbolic abstraction

Quentin Delfosse, Hikaru Shindo, Devendra Dhami, and Kristian Kersting. Interpretable and explainable logi- cal policies via neurally guided symbolic abstraction. Advances in Neural Information Processing Systems, 36:50838–50858, 2023

work page 2023
[5]

Sym- bxrl: symbolic explainable deep reinforcement learning for mobile networks

Abhishek Duttagupta, MohammadErfan Jabbari, Clau- dio Fiandrino, Marco Fiore, and Joerg Widmer. Sym- bxrl: symbolic explainable deep reinforcement learning for mobile networks. InIEEE INFOCOM 2025-IEEE Conference on Computer Communications, pages 1–10. IEEE, 2025

work page 2025
[6]

Explora: Ai/ml explainability for the open ran.Proceed- ings of the ACM on Networking, 1(CoNEXT3):1–26, 2023

Claudio Fiandrino, Leonardo Bonati, Salvatore D’Oro, Michele Polese, Tommaso Melodia, and Joerg Widmer. Explora: Ai/ml explainability for the open ran.Proceed- ings of the ACM on Networking, 1(CoNEXT3):1–26, 2023

work page 2023
[7]

Aichronolens: advancing explainability for time series ai forecasting in mobile networks

Claudio Fiandrino, Eloy Pérez Gómez, Pablo Fernández Pérez, Hossein Mohammadalizadeh, Marco Fiore, and Joerg Widmer. Aichronolens: advancing explainability for time series ai forecasting in mobile networks. In IEEE INFOCOM 2024-IEEE Conference on Computer Communications, pages 1521–1530. IEEE, 2024

work page 2024
[8]

Using first- order logic to reason about policies.ACM Transactions on Information and System Security (TISSEC), 11(4):1– 41, 2008

Joseph Y Halpern and Vicky Weissman. Using first- order logic to reason about policies.ACM Transactions on Information and System Security (TISSEC), 11(4):1– 41, 2008

work page 2008
[9]

Drl-based trajectory optimization and task offloading in hierarchical aerial mec.IEEE Internet of Things Journal, 12(3):3410–3423, 2024

Zhihao Hu, Yaozong Yang, Wei Gu, Ying Chen, and Jiwei Huang. Drl-based trajectory optimization and task offloading in hierarchical aerial mec.IEEE Internet of Things Journal, 12(3):3410–3423, 2024

work page 2024
[10]

Sia: Symbolic interpretability for anticipatory deep rein- forcement learning in network control.arXiv preprint arXiv:2601.22044, 2026

MohammadErfan Jabbari, Abhishek Duttagupta, Clau- dio Fiandrino, Leonardo Bonati, Salvatore D’Oro, Michele Polese, Marco Fiore, and Tommaso Melodia. Sia: Symbolic interpretability for anticipatory deep rein- forcement learning in network control.arXiv preprint arXiv:2601.22044, 2026

work page arXiv 2026
[11]

Be- yond interpretability: Exploring the comprehensibility of adaptive video streaming through large language mod- els

Lianchen Jia, Chaoyang Li, Ziqi Yuan, Jiahui Chen, Tianchi Huang, Jiangchuan Liu, and Lifeng Sun. Be- yond interpretability: Exploring the comprehensibility of adaptive video streaming through large language mod- els. InProceedings of the 33rd ACM International Con- ference on Multimedia, pages 12035–12044, 2025

work page 2025
[12]

Eexapp: Gnn- based reinforcement learning for radio unit energy opti- mization in 5g o-ran.arXiv preprint arXiv:2602.09206, 2026

Jie Lu, Peihao Yan, and Huacheng Zeng. Eexapp: Gnn- based reinforcement learning for radio unit energy opti- mization in 5g o-ran.arXiv preprint arXiv:2602.09206, 2026

work page arXiv 2026
[13]

A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions.Advances in neural information processing systems, 30, 2017

work page 2017
[14]

Hier- archical multi-agent drl based dynamic cluster reconfigu- ration for uav mobility management.IEEE Transactions on Cognitive Communications and Networking, 2025

Irshad A Meer, Karl-Ludwig Besser, Mustafa Ozger, Do- minic Schupke, H Vincent Poor, and Cicek Cavdar. Hier- archical multi-agent drl based dynamic cluster reconfigu- ration for uav mobility management.IEEE Transactions on Cognitive Communications and Networking, 2025

work page 2025
[15]

Interpreting deep learning-based networking systems

Zili Meng, Minhu Wang, Jiasong Bai, Mingwei Xu, Hongzi Mao, and Hongxin Hu. Interpreting deep learning-based networking systems. InProceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pages 154–171, 2020

work page 2020
[16]

Mutant: Learning congestion control from existing protocols via online reinforcement learning

Lorenzo Pappone, Alessio Sacco, and Flavio Espos- ito. Mutant: Learning congestion control from existing protocols via online reinforcement learning. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25), pages 1507–1522, 2025

work page 2025
[17]

arXiv preprint arXiv:1912.04871 (2019)

Brenden K Petersen, Mikel Landajuela, T Nathan Mund- henk, Claudio P Santiago, Soo K Kim, and Joanne T Kim. Deep symbolic regression: Recovering mathe- matical expressions from data via risk-seeking policy gradients.arXiv preprint arXiv:1912.04871, 2019. 13

work page arXiv 1912
[18]

Understanding o-ran: Architecture, interfaces, algorithms, security, and research challenges.IEEE Communications Surveys & Tutorials, 25(2):1376–1411, 2023

Michele Polese, Leonardo Bonati, Salvatore D’oro, Ste- fano Basagni, and Tommaso Melodia. Understanding o-ran: Architecture, interfaces, algorithms, security, and research challenges.IEEE Communications Surveys & Tutorials, 25(2):1376–1411, 2023

work page 2023
[19]

why should i trust you?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the predictions of any classifier. InProceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016

work page 2016
[20]

Policy Distillation

Andrei A Rusu, Sergio Gomez Colmenarejo, Caglar Gul- cehre, Guillaume Desjardins, James Kirkpatrick, Raz- van Pascanu, V olodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. Policy distillation.arXiv preprint arXiv:1511.06295, 2015

work page Pith review arXiv 2015
[21]

oran-sc-ric

SRS. oran-sc-ric. [Online]. Available: https:// github.com/srsran/oran-sc-ric, 2024

work page 2024
[22]

srsRAN Project

SRS. srsRAN Project. [Online]. Available: https: //github.com/srsran/srsRAN_Project, 2025

work page 2025
[23]

Ax- iomatic attribution for deep networks

Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Ax- iomatic attribution for deep networks. InInternational conference on machine learning, pages 3319–3328. PMLR, 2017

work page 2017
[24]

Near-real-time resource slicing for qos optimization in 5g o-ran using deep reinforcement learning.arXiv preprint arXiv:2509.14343, 2025

Peihao Yan, Jie Lu, Huacheng Zeng, and Y Thomas Hou. Near-real-time resource slicing for qos optimization in 5g o-ran using deep reinforcement learning.arXiv preprint arXiv:2509.14343, 2025

work page arXiv 2025
[25]

Deep sets.Advances in neural information processing systems, 30, 2017

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexan- der J Smola. Deep sets.Advances in neural information processing systems, 30, 2017

work page 2017
[26]

A novel neural source code representation based on abstract syntax tree

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, and Xudong Liu. A novel neural source code representation based on abstract syntax tree. In 2019 IEEE/ACM 41st International Conference on Soft- ware Engineering (ICSE), pages 783–794. IEEE, 2019

work page 2019
[27]

inran: Interpretable online bayesian learning for network automation in open radio access networks.arXiv preprint arXiv:2601.03219, 2026

Ming Zhao, Yuru Zhang, Qiang Liu, Ahan Kak, and Nakjung Choi. inran: Interpretable online bayesian learning for network automation in open radio access networks.arXiv preprint arXiv:2601.03219, 2026. 14

work page arXiv 2026