pith. sign in

arxiv: 2604.04914 · v1 · submitted 2026-04-06 · 💻 cs.NI · cs.AI· cs.LG

Analyzing Symbolic Properties for DRL Agents in Systems and Networking

Pith reviewed 2026-05-10 18:37 UTC · model grok-4.3

classification 💻 cs.NI cs.AIcs.LG
keywords deep reinforcement learningsymbolic propertiesDNN verificationmonotonicityrobustnesssystems and networkingcounterexamplespolicy verification
0
0 comments X

The pith

Symbolic properties let DRL agents be checked over ranges of system states instead of single points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a way to express and verify properties that describe how a deep reinforcement learning agent should behave across intervals of inputs rather than at one fixed state. It shows these symbolic properties can be turned into comparisons between related runs of the same policy and then split into smaller pieces that standard neural-network verifiers can handle. The approach is tested on agents for adaptive video streaming, wireless resource allocation, and congestion control. A sympathetic reader would care because point-wise checks leave large parts of the operating space unexamined, while range-based checks can surface operationally relevant failures with less manual setup.

Core claim

Symbolic properties can be analyzed using existing DNN verification engines once they are encoded as comparisons between related executions of the policy and decomposed into tractable sub-properties; when applied to three DRL control systems this yields substantially broader coverage than point properties and surfaces non-obvious counterexamples.

What carries the argument

Decomposition of symbolic properties into comparisons between related policy executions that existing DNN verifiers can solve soundly.

If this is right

  • Verification becomes feasible over wide input ranges in video streaming, wireless management, and congestion control without enumerating every state.
  • Satisfaction of the same symbolic property can be tracked as the agent trains.
  • Model size can be related directly to how many of the decomposed sub-properties remain solvable.
  • Different verification backends can be compared on the same symbolic properties to expose practical speed and completeness trade-offs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Engineers could use the same encoding technique to define new properties beyond monotonicity and robustness that fit their particular system.
  • The observed solver limitations point toward the need for verifiers that accept the comparison form directly rather than requiring manual decomposition.
  • If the decomposition preserves semantics, the method could be reused on any policy whose inputs and outputs can be related by simple arithmetic constraints.

Load-bearing premise

Symbolic properties can be rewritten as comparisons between related executions and split into sub-properties that DNN verifiers solve without changing what the original property meant.

What would settle it

A concrete case in one of the three systems where the decomposed sub-properties accept a policy that violates the original symbolic property (or vice versa).

Figures

Figures reproduced from arXiv: 2604.04914 by Amr Rizk, Jannis Weil, Mina Tahmasbi Arashloo, Mohammad Zangooei, Raouf Boutaba.

Figure 1
Figure 1. Figure 1: Point-wise versus symbolic property analysis. Left (Prior Work): Analysis is performed at individual points that humans provide, checking that the property holds under bounded perturbations around one concrete point in the input space. In this example, the property asserts that the output action does not change. Checking properties only at individual points results in limited coverage of the input space an… view at source ↗
Figure 2
Figure 2. Figure 2: Symbolic properties over DRL agent execution pairs (§3). A symbolic input 𝑥 from the region of interest in the input space and its bounded perturbation 𝑥 + 𝑠 induce two executions of the agent’s policy 𝜋. The policy 𝜋 is based on a non-linear DNN. As such, input perturbations may result in jumps in the DNN’s multi-dimensional continuous output space (see blue highlights in the center). The DNN’s output may… view at source ↗
Figure 3
Figure 3. Figure 3: diffRL’s overview (§4). Starting from a trained DNN and a symbolic property specification, diffRL applies a comparative encoding to construct two coupled executions of the same policy under a bounded input perturbation. The resulting formulation is decomposed into multiple queries, dispatched to heterogeneous solvers. The individual solver outcomes are then aggregated to produce the final verification resu… view at source ↗
Figure 4
Figure 4. Figure 4: (a) and (b) illustrate the analysis results for the [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Query execution time comparison for verifying [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Analysis results for Rebuffering Avoidance (left) and Robustness (right) for [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of analysis outcomes across solvers and properties. The figure reports the number of safe, [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Query execution time (in seconds, log scale) for verifying the channel compensation property of CMARS [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
read the original abstract

Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congestion control. For safe deployment, however, it is critical to reason about how agents behave across the range of system states they encounter in practice. Existing verification-based methods in this domain primarily focus on point properties, defined around fixed input states, which offer limited coverage and require substantial manual effort to identify relevant input-output pairs for analysis. In this paper, we study symbolic properties, that specify expected behavior over ranges of input states, for DRL agents in systems and networking. We present a generic formulation for symbolic properties, with monotonicity and robustness as concrete examples, and show how they can be analyzed using existing DNN verification engines. Our approach encodes symbolic properties as comparisons between related executions of the same policy and decomposes them into practically tractable sub-properties. These techniques serve as practical enablers for applying existing verification tools to symbolic analysis. Using our framework, diffRL, we conduct an extensive empirical study across three DRL-based control systems, adaptive video streaming, wireless resource management, and congestion control. Through these case studies, we analyze symbolic properties over broad input ranges, examine how property satisfaction evolves during training, study the impact of model size on verifiability, and compare multiple verification backends. Our results show that symbolic properties provide substantially broader coverage than point properties and can uncover non-obvious, operationally meaningful counterexamples, while also revealing practical solver trade-offs and limitations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces the diffRL framework for verifying symbolic properties (e.g., monotonicity and robustness) of DRL agents in systems and networking domains such as adaptive video streaming, wireless resource management, and congestion control. It formulates symbolic properties over ranges of states, encodes them as comparisons between related policy executions, decomposes the resulting queries into sub-properties amenable to existing DNN verification engines, and reports an empirical study showing that these properties yield substantially broader coverage than point properties while surfacing operationally relevant counterexamples.

Significance. If the decomposition is shown to be sound, the work would meaningfully advance verification practice for DRL controllers in networked systems by replacing labor-intensive point-wise checks with range-based symbolic analysis. The three-system empirical evaluation, comparison of verification backends, and examination of training dynamics and model-size effects provide concrete evidence of practical utility and solver limitations.

major comments (1)
  1. [Abstract / diffRL framework] Abstract and the description of the diffRL encoding/decomposition: the central claim that symbolic properties deliver substantially broader coverage and non-obvious counterexamples rests on the assertion that the decomposition into DNN-verifiable sub-properties is sound and semantics-preserving. No formal argument, completeness guarantee, or counterexample-mapping proof is supplied showing that sub-property satisfaction implies original-property satisfaction (and vice versa for violations). Because DNN verifiers are typically sound but incomplete, any semantic gap introduced by the decomposition would directly undermine the reported coverage gains and counterexample findings.
minor comments (1)
  1. [Abstract] The abstract refers to 'practical enablers' without indicating whether the decomposition is complete or only sound; clarifying this distinction would help readers assess the strength of the empirical results.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential of diffRL to advance verification of DRL agents in networked systems. We address the major comment on the soundness of the decomposition below and will revise the manuscript to strengthen the formal foundations.

read point-by-point responses
  1. Referee: [Abstract / diffRL framework] Abstract and the description of the diffRL encoding/decomposition: the central claim that symbolic properties deliver substantially broader coverage and non-obvious counterexamples rests on the assertion that the decomposition into DNN-verifiable sub-properties is sound and semantics-preserving. No formal argument, completeness guarantee, or counterexample-mapping proof is supplied showing that sub-property satisfaction implies original-property satisfaction (and vice versa for violations). Because DNN verifiers are typically sound but incomplete, any semantic gap introduced by the decomposition would directly undermine the reported coverage gains and counterexample findings.

    Authors: We agree that a formal argument for the soundness and semantics-preservation of the decomposition is necessary to support the central claims. The current manuscript presents the encoding of symbolic properties (e.g., monotonicity and robustness) as comparisons between related policy executions and their decomposition into sub-properties, but does not include an explicit proof. In the revised version we will add a dedicated subsection with a formal argument showing that (i) satisfaction of the decomposed sub-properties implies satisfaction of the original symbolic property over the input range, and (ii) any violation found by a sound DNN verifier on a sub-property corresponds to a valid counterexample for the original property. The argument will be based on the range-partitioning strategy and the fact that the comparison-based encoding preserves the semantics of the original predicate; we will also discuss the implications of verifier incompleteness for the reported coverage results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new encoding and empirical validation are independent of inputs

full rationale

The paper introduces a generic formulation for symbolic properties (with monotonicity and robustness as examples), encodes them as comparisons between related policy executions, and decomposes them into sub-properties for use with existing DNN verification engines. These steps are presented as a methodological contribution (diffRL) that enables application of external tools, followed by empirical case studies on three DRL systems to measure coverage, training evolution, model size impact, and solver trade-offs. The reported results on broader coverage and counterexamples are obtained by running the framework on concrete instances rather than being forced by construction from fitted parameters, self-referential definitions, or load-bearing self-citations. No uniqueness theorems, ansatzes smuggled via prior work, or renaming of known results are invoked in the derivation chain. The approach is self-contained against external benchmarks (DNN verifiers) and does not reduce any central claim to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The claim rests on the domain assumption that existing DNN verifiers remain sound when applied to the decomposed sub-properties derived from symbolic comparisons; no free parameters or new invented entities with independent evidence are described.

axioms (1)
  • domain assumption Existing DNN verification engines can be applied to the decomposed sub-properties derived from symbolic property encodings without loss of soundness
    The paper states that symbolic properties are analyzed using existing DNN verification engines after decomposition.
invented entities (1)
  • diffRL framework no independent evidence
    purpose: To encode and decompose symbolic properties for verification of DRL agents
    New framework introduced to enable the symbolic analysis described.

pith-pipeline@v0.9.0 · 5596 in / 1446 out tokens · 64960 ms · 2026-05-10T18:37:42.222078+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages

  1. [1]

    v2.0.0. Marabou. https://github.com/NeuralNetworkVerification/Marabou. Accessed: September, 2024

  2. [2]

    Soheil Abbasloo, Chen-Yu Yen, and H Jonathan Chao. 2020. Classic meets modern: A pragmatic learning-based congestion control for the internet. InProceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 632–647

  3. [3]

    Anup Agarwal, Venkat Arun, Devdeep Ray, Ruben Martins, and Srinivasan Seshan. 2024. Towards provably performant congestion control. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 951–978

  4. [4]

    Mina Tahmasbi Arashloo, Ryan Beckett, and Rachit Agarwal. 2023. Formal methods for network performance analysis. In20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 645–661

  5. [5]

    Venkat Arun, Mina Tahmasbi Arashloo, Ahmed Saeed, Mohammad Alizadeh, and Hari Balakrishnan. 2021. Toward formally verifying congestion control behavior. InProceedings of the 2021 ACM SIGCOMM 2021 Conference. 1–16

  6. [6]

    David Boetius and Stefan Leue. 2024. Counterexample-Guided Repair of Reinforcement Learning Systems Using Safety Critics.arXiv preprint arXiv:2405.15430(2024)

  7. [7]

    Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh, et al. 2019. Learning Generalizable Device Placement Algorithms for Distributed Machine Learning.Advances in Neural Information Processing Systems32 (2019)

  8. [8]

    Christopher Brix, Stanley Bak, Changliu Liu, and Taylor T Johnson. 2023. The fourth international verification of neural networks competition (VNN-COMP 2023): Summary and results.arXiv preprint arXiv:2312.16760(2023)

  9. [9]

    Rudy Bunel, Jingyue Lu, Ilker Turkaslan, Philip HS Torr, Pushmeet Kohli, and M Pawan Kumar. 2020. Branch and bound for piecewise linear neural network verification.Journal of Machine Learning Research21, 42 (2020), 1–39

  10. [10]

    Arnav Chakravarthy, Nina Narodytska, Asmitha Rathis, Marius Vilcu, Mahmood Sharif, and Gagandeep Singh. 2022. Property-Driven Evaluation of RL-Controllers in Self-Driving Datacenters. InWorkshop on Challenges in Deploying and Monitoring Machine Learning Systems, NeurIPS Virtual Workshop

  11. [11]

    Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. 2018. Auto: Scaling deep reinforcement learning for datacenter- scale automatic traffic optimization. InProceedings of the 2018 conference of the ACM special interest group on data communication. 191–205

  12. [12]

    Sandeep Chinchali, Pan Hu, Tianshu Chu, Manu Sharma, Manu Bansal, Rakesh Misra, Marco Pavone, and Sachin Katti

  13. [13]

    InProceedings of the AAAI Conference on Artificial Intelligence, Vol

    Cellular network traffic scheduling with deep reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 32

  14. [14]

    Micha Dery, Orr Krupnik, and Isaac Keslassy. 2023. QueuePilot: Reviving Small Buffers With a Learned AQM Policy. InIEEE INFOCOM 2023-IEEE Conference on Computer Communications. IEEE, 1–10

  15. [15]

    Arnaud Dethise, Marco Canini, and Srikanth Kandula. 2019. Cracking open the black box: What observations can tell us about reinforcement learning agents. InProceedings of the 2019 Workshop on Network Meets AI & ML. 29–36

  16. [16]

    Arnaud Dethise, Marco Canini, and Nina Narodytska. 2021. Analyzing learning-based networked systems with formal verification. InIEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE, 1–10

  17. [17]

    Tomer Eliyahu, Yafim Kazak, Guy Katz, and Michael Schapira. 2021. Verifying learning-augmented systems. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference. 305–318

  18. [18]

    Briti Gangopadhyay, Pallab Dasgupta, and Soumyajit Dey. 2023. Counterexample-Guided Policy Refinement in Multi-Agent Reinforcement Learning. InProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. 1606–1614

  19. [19]

    Tomer Gilad, Nathan H Jay, Michael Shnaiderman, Brighten Godfrey, and Michael Schapira. 2019. Robustifying network protocols with adversarial examples. InProceedings of the 18th ACM Workshop on Hot Topics in Networks. 85–92

  20. [20]

    Fei Gui, Songtao Wang, Dan Li, Li Chen, Kaihui Gao, Congcong Min, and Yi Wang. 2024. RedTE: Mitigating subsecond traffic bursts with real-time and distributed traffic engineering. InProceedings of the ACM SIGCOMM 2024 Conference. 71–85

  21. [21]

    2024.Gurobi Optimizer Reference Manual

    Gurobi Optimization, LLC. 2024.Gurobi Optimizer Reference Manual. https://www.gurobi.com

  22. [22]

    Yi Hu, Chaoran Zhang, Edward Andert, Harshul Singh, Aviral Shrivastava, James Laudon, Yanqi Zhou, Bob Iannucci, and Carlee Joe-Wong. 2023. GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing.Proceedings of Machine Learning and Systems5 (2023)

  23. [23]

    Yangfan Huang, Yuling Lin, Haizhou Du, Yijian Chen, Haohao Song, Linghe Kong, Qiao Xiang, Qiang Li, Franck Le, and Jiwu Shu. 2023. Toward a Unified Framework for Verifying and Interpreting Learning-Based Networking Systems. Proc. ACM Meas. Anal. Comput. Syst., Vol. 10, No. 2, Article 29. Publication date: June 2026. Analyzing Symbolic Properties for DRL A...

  24. [24]

    2019.IBM ILOG CPLEX Optimization Studio

    IBM Corporation. 2019.IBM ILOG CPLEX Optimization Studio. IBM

  25. [25]

    Nathan Jay, Noga Rotman, Brighten Godfrey, Michael Schapira, and Aviv Tamar. 2019. A deep reinforcement learning perspective on internet congestion control. InInternational Conference on Machine Learning. PMLR, 3050–3059

  26. [26]

    Lianchen Jia, Chao Zhou, Tianchi Huang, Chaoyang Li, and Lifeng Sun. 2023. RDladder: Resolution-Duration Ladder for VBR-encoded Videos via Imitation Learning. InIEEE INFOCOM 2023-IEEE Conference on Computer Communications. IEEE, 1–10

  27. [27]

    Lianchen Jia, Chao Zhou, Tianchi Huang, Chaoyang Li, and Lifeng Sun. 2024. Dancing with Shackles, Meet the Challenge of Industrial Adaptive Streaming via Offline Reinforcement Learning. InIEEE INFOCOM 2024-IEEE Conference on Computer Communications. IEEE, 2169–2178

  28. [28]

    Shuowei Jin, Francis Y Yan, Cheng Tan, Anuj Kalia, Xenofon Foukas, and Z Morley Mao. 2024. AutoSpec: Automated Generation of Neural Network Specifications.arXiv preprint arXiv:2409.10897(2024)

  29. [29]

    Suraj Jog, Zikun Liu, Antonio Franques, Vimuth Fernando, Sergi Abadal, Josep Torrellas, and Haitham Hassanieh

  30. [30]

    In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)

    One protocol to rule them all: Wireless Network-on-Chip using deep reinforcement learning. In18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 973–989

  31. [31]

    Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. 2017. Reluplex: An efficient SMT solver for verifying deep neural networks. InComputer Aided Verification: 29th International Conference, CA V 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I 30. Springer, 97–117

  32. [32]

    Woo-Hyun Ko, Ushasi Ghosh, Ujwal Dinesha, Raini Wu, Srinivas Shakkottai, and Dinesh Bharadia. 2024. EdgeRIC: Empowering Real-time Intelligent Optimization and Control in NextG Cellular Networks. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 1315–1330. https: //www.usenix.org/conferen...

  33. [33]

    Suhas Kotha, Christopher Brix, J Zico Kolter, Krishnamurthy Dvijotham, and Huan Zhang. 2023. Provably bounding neural network preimages.Advances in Neural Information Processing Systems36 (2023), 80270–80290

  34. [34]

    Brighten Godfrey

    Bingzhe Liu, Gangmuk Lim, Ryan Beckett, and P. Brighten Godfrey. 2024. Kivi: Verification for Cluster Management. In2024 USENIX Annual Technical Conference (USENIX ATC 24). USENIX Association, Santa Clara, CA, 509–527. https://www.usenix.org/conference/atc24/presentation/liu-bingzhe

  35. [35]

    Zikun Liu, Changming Xu, Yuqing Xie, Emerson Sie, Fan Yang, Kevin Karwaski, Gagandeep Singh, Zhao Lucis Li, Yu Zhou, Deepak Vasisht, et al. 2023. Exploring practical vulnerabilities of machine learning-based wireless systems. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 1801–1817

  36. [36]

    Yiqing Ma, Han Tian, Xudong Liao, Junxue Zhang, Weiyan Wang, Kai Chen, and Xin Jin. 2022. Multi-objective congestion control. InProceedings of the Seventeenth European Conference on Computer Systems. 218–235

  37. [37]

    Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource management with deep reinforcement learning. InProceedings of the 15th ACM workshop on hot topics in networks. 50–56

  38. [38]

    Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural adaptive video streaming with pensieve. In Proceedings of the conference of the ACM special interest group on data communication. 197–210

  39. [39]

    Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning scheduling algorithms for data processing clusters. InProceedings of the ACM special interest group on data communication. 270–288

  40. [40]

    Zili Meng, Minhu Wang, Jiasong Bai, Mingwei Xu, Hongzi Mao, and Hongxin Hu. 2020. Interpreting deep learning-based networking systems. InProceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 154–171

  41. [41]

    Alessandro Montenegro, Marco Mussi, Alberto Maria Metelli, and Matteo Papini. 2024. Learning optimal deterministic policies with stochastic policy gradients. InProceedings of the 41st International Conference on Machine Learning (ICML’24). JMLR.org, Vienna, Austria, Article 1473, 52 pages

  42. [42]

    Janosch Moos, Kay Hansel, Hany Abdulsamad, Svenja Stark, Debora Clever, and Jan Peters. 2022. Robust Reinforcement Learning: A Review of Foundations and Recent Advances.Machine Learning and Knowledge Extraction4, 1 (2022), 276–315. doi:10.3390/make4010013

  43. [43]

    Mark Niklas Müller, Christopher Brix, Stanley Bak, Changliu Liu, and Taylor T Johnson. 2022. The third international verification of neural networks competition (VNN-COMP 2022): Summary and results.arXiv preprint arXiv:2212.10376 (2022)

  44. [44]

    Jinwoo Park, Jaehyeong Park, Youngmok Jung, Hwijoon Lim, Hyunho Yeo, and Dongsu Han. 2024. TopFull: An Adaptive Top-Down Overload Control for SLO-Oriented Microservices. InProceedings of the ACM SIGCOMM 2024 Conference. 876–890

  45. [45]

    Parsa Pazhooheshy, Soheil Abbasloo, and Yashar Ganjali. 2023. Harnessing ML For Network Protocol Assessment: A Congestion Control Use Case. InProceedings of the 22nd ACM Workshop on Hot Topics in Networks. 213–219. Proc. ACM Meas. Anal. Comput. Syst., Vol. 10, No. 2, Article 29. Publication date: June 2026. 29:26 Mohammad Zangooei, Jannis Weil, Amr Rizk, ...

  46. [46]

    Haoran Qiu, Subho S Banerjee, Saurabh Jha, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2020. FIRM: An intelligent fine-grained resource management framework for SLO-Oriented microservices. In14th USENIX symposium on operating systems design and implementation (OSDI 20). 805–825

  47. [47]

    Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Chen Wang, Hubertus Franke, Zbigniew T Kalbarczyk, Tamer Başar, and Ravishankar K Iyer. 2024. FLASH: Fast model adaptation in ML-centric cloud platforms.Proceedings of Machine Learning and Systems6 (2024), 524–544

  48. [48]

    Davor Runje and Sharath M Shankaranarayana. 2023. Constrained monotonic neural networks. InInternational Conference on Machine Learning. PMLR, 29338–29353

  49. [49]

    Qianli Shen, Yan Li, Haoming Jiang, Zhaoran Wang, and Tuo Zhao. 2020. Deep reinforcement learning with robust and smooth policy. InInternational Conference on Machine Learning. PMLR, 8707–8718

  50. [50]

    Zhouxing Shi, Qirui Jin, Zico Kolter, Suman Jana, Cho-Jui Hsieh, and Huan Zhang. 2025. Neural network verifica- tion with branch-and-bound for general nonlinearities. InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 315–335

  51. [51]

    Xudong Sun, Wenjie Ma, Jiawei Tyler Gu, Zicheng Ma, Tej Chajed, Jon Howell, Andrea Lattuada, Oded Padon, Lalith Suresh, Adriana Szekeres, and Tianyin Xu. 2024. Anvil: Verifying Liveness of Cluster Management Controllers. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 649–666. https://...

  52. [52]

    2018.Reinforcement learning: An introduction

    Richard S Sutton and Andrew G Barto. 2018.Reinforcement learning: An introduction. MIT press

  53. [53]

    Jiaxin Tang, Sen Liu, Yang Xu, Zehua Guo, Junjie Zhang, Peixuan Gao, Yang Chen, Xin Wang, and H Jonathan Chao

  54. [54]

    InIEEE INFOCOM 2022-IEEE Conference on Computer Communications

    ABS: Adaptive buffer sizing via augmented programmability with machine learning. InIEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, 2038–2047

  55. [55]

    Chen Tessler, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, and Shie Mannor. 2022. Reinforcement learning for datacenter congestion control.ACM SIGMETRICS Performance Evaluation Review49, 2 (2022), 43–46

  56. [56]

    Xiao, and Russ Tedrake

    Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. 2019. Evaluating Robustness of Neural Networks with Mixed Integer Programming. InInternational Conference on Learning Representations. https://openreview.net/forum?id=HyGIdiRqtm

  57. [57]

    Aladin Virmaux and Kevin Scaman. 2018. Lipschitz regularity of deep neural networks: analysis and efficient estimation. Advances in Neural Information Processing Systems31 (2018)

  58. [58]

    Mowei Wang, Sijiang Huang, Yong Cui, Wendong Wang, and Zhenhua Liu. 2022. Learning Buffer Management Policies for Shared Memory Switches. InIEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 730–739. doi:10.1109/INFOCOM48880.2022.9796784

  59. [59]

    Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. 2021. Beta-crown: Efficient bound propagation with per-neuron split constraints for neural network robustness verification.Advances in Neural Information Processing Systems34 (2021), 29909–29921

  60. [60]

    Zibo Wang, Pinghe Li, Chieh-Jan Mike Liang, Feng Wu, and Francis Y Yan. 2024. Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 149–165

  61. [61]

    Antoine Wehenkel and Gilles Louppe. 2019. Unconstrained monotonic neural networks.Advances in neural information processing systems32 (2019)

  62. [62]

    Dennis Wei, Haoze Wu, Min Wu, Pin-Yu Chen, Clark Barrett, and Eitan Farchi. 2023. Convex bounds on the softmax function with applications to robustness verification. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 6853–6878

  63. [63]

    Haoze Wu, Clark Barrett, Mahmood Sharif, Nina Narodytska, and Gagandeep Singh. 2022. Scalable verification of GNN-based job schedulers.Proceedings of the ACM on Programming Languages6, OOPSLA2 (2022), 1036–1065

  64. [64]

    Haoze Wu, Omri Isac, Aleksandar Zeljić, Teruhiro Tagomori, Matthew Daggitt, Wen Kokke, Idan Refaeli, Guy Amir, Kyle Julian, Shahaf Bassan, et al. 2024. Marabou 2.0: a versatile formal analyzer of neural networks. InInternational Conference on Computer Aided Verification. Springer, 249–264

  65. [65]

    Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. 2020. Automatic perturbation analysis for scalable certified robustness and beyond.Advances in Neural Information Processing Systems33 (2020), 1129–1141

  66. [66]

    Kaidi Xu, Huan Zhang, Shiqi Wang, Yihan Wang, Suman Jana, Xue Lin, and Cho-Jui Hsieh. 2021. Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers. InInternational Conference on Learning Representations. https://openreview.net/forum?id=nVZtXBI6LNn

  67. [67]

    Zhiying Xu, Francis Y Yan, Rachee Singh, Justin T Chiu, Alexander M Rush, and Minlan Yu. 2023. Teal: Learning- accelerated optimization of wan traffic engineering. InProceedings of the ACM SIGCOMM 2023 Conference. 378–393

  68. [68]

    Siyu Yan, Xiaoliang Wang, Xiaolong Zheng, Yinben Xia, Derui Liu, and Weishan Deng. 2021. ACC: Automatic ECN tuning for high-speed datacenter networks. InProceedings of the 2021 ACM SIGCOMM 2021 Conference. 384–397. Proc. ACM Meas. Anal. Comput. Syst., Vol. 10, No. 2, Article 29. Publication date: June 2026. Analyzing Symbolic Properties for DRL Agents in ...

  69. [69]

    Mohammad Zangooei, Morteza Golkarifard, Mohamed Rouili, Niloy Saha, and Raouf Boutaba. 2023. Flexible RAN Slicing in Open RAN With Constrained Multi-Agent Reinforcement Learning.IEEE Journal on Selected Areas in Communications(2023)

  70. [70]

    Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. 2018. Efficient neural network robustness certification with general activation functions.Advances in neural information processing systems31 (2018). A Probability Calculation Details for the Aurora Case Study Let 𝑌1 ∼ N (𝐿 1 0, 𝜎2) and 𝑌2 ∼ N (𝐿 2 0, 𝜎2) represent the selected action...