pith. sign in

arxiv: 2604.18233 · v1 · submitted 2026-04-20 · 💻 cs.MA · cs.AI

Aether: Network Validation Using Agentic AI and Digital Twin

Pith reviewed 2026-05-10 03:33 UTC · model grok-4.3

classification 💻 cs.MA cs.AI
keywords network change validationagentic AIdigital twinnetwork operationserror detectionnetwork verificationAI agents
0
0 comments X

The pith

Aether automates network change validation by orchestrating five AI agents over a unified digital twin for intent analysis, verification, and testing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Aether as a system that pairs generative agentic AI with a multi-functional network digital twin to handle the full lifecycle of validating network changes automatically. Traditional approaches rely on scattered manual tools that miss errors until after deployment, while formal methods stay offline and struggle with live production updates. Aether's agents work together using consistent modeling, simulation, and emulation from the twin to check changes rapidly. Tests on synthetic scenarios and real ISP incidents show complete error detection, strong diagnostic coverage, and completion in six to seven minutes. This approach matters if it can move validation from error-prone manual steps to reliable, fast automation in operational networks.

Core claim

Aether integrates Generative Agentic AI with a multi-functional Network Digital Twin to automate and streamline network change validation workflows, featuring an agentic architecture with five specialized Network Operations AI agents that collaboratively handle the change validation lifecycle from intent analysis to network verification and testing, while using a unified digital twin integrating modeling, simulation, and emulation to maintain a consistent, up-to-date network view.

What carries the argument

Five specialized Network Operations AI agents orchestrated atop a unified Network Digital Twin that integrates modeling, simulation, and emulation for consistent verification.

If this is right

  • Validation of network changes becomes automated from intent analysis through testing, reducing reliance on scattered manual tools.
  • Error detection reaches 100 percent in synthetic change scenarios and past ISP incidents.
  • Diagnostic coverage of 92-96 percent and completion in 6-7 minutes improves speed over traditional methods.
  • The approach supports continuous changes in live production environments rather than only offline pre-deployment checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could extend validation to other infrastructure domains that use digital twins, such as cloud or data center networks.
  • Wider use might shift network operations toward routine automated checks, freeing engineers for complex design tasks.
  • Integration with existing monitoring systems could allow real-time twin updates from live telemetry feeds.

Load-bearing premise

The five specialized agents will collaborate reliably without introducing conflicts, and the digital twin will remain sufficiently accurate and current to verify live production changes without creating new undetected errors.

What would settle it

Deployment on a live production network where a change error goes undetected by Aether but is later found by manual testing or where the digital twin diverges from actual device states during verification.

Figures

Figures reproduced from arXiv: 2604.18233 by (2) Swisscom), Giovanna Carofiglio (1), Giulio Grassi (1), John Kenneth d'Souza (2) ((1) Cisco Systems, Jordan Auge (1), Martin Gysi (2), Sam Betts (1).

Figure 1
Figure 1. Figure 1: Network Change Validation process. IV. AETHER Aether combines (i) a unified Network Digital Twin (NDT) with (ii) a suite of specialized AI agents driving the network change validation process by interacting with the user and with the NDT. The NDT acts as the core representation and tool for network validation: it is composed of • a Network Digital Map (NDM), i.e. a network represen￾tation in the form of a … view at source ↗
Figure 3
Figure 3. Figure 3: Agents logical architecture. A. Aether Agents Aether employs a multi-agent architecture to automate net￾work change validation, represented as external entities in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Aether workflow. Aether workflow is described in Fig.2: network telemetry data from production network is periodically ingested into the NDM to maintain an accurate up-to-date representation of the current network state and configuration. When a change request is initiated, Aether agents collaborate to analyze the change intent, assess its potential impact and generate tailored test plans, based on the int… view at source ↗
Figure 4
Figure 4. Figure 4: NDM Knowledge Graph structure. they operate on accurate and up-to-date representations of the network state. Aether NDT fetches data from the NDM as needed and transforms it based on the requirements of the tools. It also provides computing capabilities to the NDM to enrich the internal network representation. 1) Network Digital Map: The Network Digital Map (NDM) serves as a unified data model and network … view at source ↗
Figure 5
Figure 5. Figure 5: NDT workflow to manage snapshots changes. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Single agent correctness. higher Main Error Precision. Overall, the results show that Aether can accurately detect the most relevant and critical problems, enabling streamlined network change validation. The user remains in the loop and may validate Aether’s actions by analyzing the detailed final report and updating the ITSM ticket to improve test plan and verifications. Guardrails, agent monitoring as we… view at source ↗
read the original abstract

Network change validation remains a critical yet predominantly manual, time-consuming, and error-prone process in modern network operations. While formal network verification has made substantial progress in proving correctness properties, it is typically applied in offline, pre-deployment settings and faces challenges in accommodating continuous changes and validating live production behavior. Current operational approaches typically involve scattered testing tools, resulting in partial coverage and errors that surface only after deployment. In this paper, we present Aether, a novel approach that integrates Generative Agentic AI with a multi-functional Network Digital Twin to automate and streamline network change validation workflows. It features an agentic architecture with five specialized Network Operations AI agents that collaboratively handle the change validation lifecycle from intent analysis to network verification and testing. Aether agents use a unified Network Digital Twin integrating modeling, simulation, and emulation to maintain a consistent, up-to-date network view for verification and testing. By orchestrating agent collaboration atop this digital twin, Aether enables automated, rapid network change validation while reducing manual effort, minimizing errors, and improving operational agility and cost-effectiveness. We evaluate Aether over synthetic network change scenarios covering main classes of network changes and on past incidents from a major ISP operational network, demonstrating promising results in error detection (100%), diagnostic coverage (92-96%), and speed (6-7 minutes) over traditional methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents Aether, a system integrating Generative Agentic AI with a multi-functional Network Digital Twin for automating network change validation. It deploys five specialized agents that collaboratively manage the validation lifecycle from intent analysis through verification and testing, all operating atop a unified digital twin that combines modeling, simulation, and emulation to maintain a consistent network view. Evaluation is performed on synthetic scenarios covering main classes of network changes and on historical incidents from a major ISP operational network, with reported outcomes of 100% error detection, 92-96% diagnostic coverage, and 6-7 minute validation times compared to traditional methods.

Significance. If the results hold under more rigorous scrutiny, the work could meaningfully advance practical network operations by reducing manual, error-prone change validation in large-scale environments. The combination of agentic AI orchestration with a live digital twin represents a concrete step toward automated, continuous validation that goes beyond offline formal methods. The use of real ISP incident data is a positive empirical anchor, though the absence of detailed methodology limits immediate impact assessment.

major comments (3)
  1. [Abstract / Evaluation] Abstract and Evaluation section: The central quantitative claims (100% error detection, 92-96% diagnostic coverage, 6-7 minute speed) are presented without any description of the number of synthetic scenarios or incidents tested, the specific traditional-method baselines, statistical significance, or failure-mode analysis. These omissions make it impossible to assess whether the reported gains are robust or merely scenario-specific.
  2. [System Architecture] Digital Twin description: The manuscript states that the unified digital twin 'maintains a consistent, up-to-date network view' but provides no mechanisms, latency bounds, or refresh protocols for keeping the twin synchronized with live production state under continuous changes or partial updates. This is load-bearing for the live-validation claim yet remains unaddressed.
  3. [Agentic Architecture] Agent collaboration: The five specialized agents are described as collaborating on the validation lifecycle, but no details are given on inter-agent communication protocols, conflict resolution, consistency checks, or safeguards against hallucinations or missed interactions. The reported performance numbers rest on the untested assumption that this collaboration is reliable.
minor comments (2)
  1. [Abstract] The abstract uses the term 'diagnostic coverage' without a precise definition or formula; a short clarifying sentence would improve readability.
  2. [Digital Twin] No mention is made of how the digital twin handles emulation fidelity versus simulation speed trade-offs; a brief note on this engineering choice would help readers understand the implementation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments identify important areas where the manuscript can be strengthened, particularly around the transparency of evaluation results and the completeness of architectural descriptions. We have revised the manuscript to address each point and provide additional details below.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and Evaluation section: The central quantitative claims (100% error detection, 92-96% diagnostic coverage, 6-7 minute speed) are presented without any description of the number of synthetic scenarios or incidents tested, the specific traditional-method baselines, statistical significance, or failure-mode analysis. These omissions make it impossible to assess whether the reported gains are robust or merely scenario-specific.

    Authors: We agree that the Evaluation section would benefit from greater explicitness to allow readers to judge robustness. While the manuscript described the use of synthetic scenarios covering main classes of network changes and historical ISP incidents, we acknowledge that exact counts, baseline specifications, statistical tests, and failure-mode analysis were not presented with sufficient detail. In the revised manuscript we have expanded the Evaluation section to report the precise number of synthetic scenarios and real incidents evaluated, to name the traditional baselines (manual validation processes and standard verification tools), to include statistical significance measures, and to add a failure-mode analysis. These additions directly support assessment of the reported 100% error detection, 92-96% diagnostic coverage, and 6-7 minute times. revision: yes

  2. Referee: [System Architecture] Digital Twin description: The manuscript states that the unified digital twin 'maintains a consistent, up-to-date network view' but provides no mechanisms, latency bounds, or refresh protocols for keeping the twin synchronized with live production state under continuous changes or partial updates. This is load-bearing for the live-validation claim yet remains unaddressed.

    Authors: We accept that the synchronization mechanisms were described only at a high level. The revised manuscript now includes an expanded subsection on the Network Digital Twin that specifies the event-driven synchronization approach, the latency bounds observed in our implementation, the refresh protocols, and the handling of partial updates. These additions clarify how the twin remains consistent with live production state and thereby supports the live-validation workflow. revision: yes

  3. Referee: [Agentic Architecture] Agent collaboration: The five specialized agents are described as collaborating on the validation lifecycle, but no details are given on inter-agent communication protocols, conflict resolution, consistency checks, or safeguards against hallucinations or missed interactions. The reported performance numbers rest on the untested assumption that this collaboration is reliable.

    Authors: We agree that the collaboration mechanics among the five agents required more elaboration. The revised manuscript adds a new subsection on Agent Collaboration that describes the inter-agent communication protocols, the conflict-resolution procedure, the consistency checks performed across agent outputs, and the safeguards implemented against hallucinations and missed interactions. These details provide the necessary grounding for the reliability of the reported performance results. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; empirical results are direct measurements

full rationale

The paper describes an architectural system (five specialized agents + unified digital twin) and reports empirical evaluation metrics (100% error detection, 92-96% diagnostic coverage, 6-7 min speed) on synthetic scenarios and historical ISP incidents. No equations, derivations, fitted parameters, or predictions are present. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The performance numbers are presented as measured outcomes rather than quantities defined in terms of the system itself, so no step reduces to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a systems-engineering description with no mathematical derivations, fitted parameters, or new physical entities; the central claims rest on the assumption that agent collaboration and digital-twin fidelity are sufficient for production use.

pith-pipeline@v0.9.0 · 5575 in / 1221 out tokens · 49641 ms · 2026-05-10T03:33:38.957222+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 1 internal anchor

  1. [1]

    The hid- den costs of downtime:the $400b problem facing the global 2000,

    A. Mohanty, T. Robinson, and A. O’Farrell, “The hid- den costs of downtime:the $400b problem facing the global 2000,” https://www.oxfordeconomics.com/resource/ the-hidden-costs-of-downtime-the-400b-problem-facing-the-global-2000/, Jul 2024

  2. [2]

    Ansible: Simple, agentless it automation,

    Red Hat, Inc., “Ansible: Simple, agentless it automation,” https://www. ansible.com/, 2024, open-source automation tool for configuration man- agement and application deployment

  3. [3]

    Puppet: Infrastructure automation for security and com- pliance,

    Puppet, Inc., “Puppet: Infrastructure automation for security and com- pliance,” https://puppet.com/, 2024, configuration management and au- tomation platform

  4. [4]

    Header space analysis: Static checking for networks,

    P. Kazemian, G. Varghese, and N. McKeown, “Header space analysis: Static checking for networks,” in9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). San Jose, CA: USENIX Association, Apr. 2012, pp. 113–126. [Online]. Available: https://www.usenix.org/conference/nsdi12/technical-sessions/ presentation/kazemian

  5. [5]

    Checking cloud contracts in microsoft azure,

    N. Bjørner and K. Jayaraman, “Checking cloud contracts in microsoft azure,” inDistributed Computing and Internet Technology, R. Natarajan, G. Barua, and M. R. Patra, Eds. Cham: Springer International Publishing, 2015, pp. 21–32

  6. [6]

    A general approach to network configuration verification,

    R. Beckett, A. Gupta, R. Mahajan, and D. Walker, “A general approach to network configuration verification,” inProc. of SIGCOMM ’17, Los Angeles, CA, USA, August 21-25, 2017. ACM - Association for Computing Machinery, August 2017, p. 14

  7. [7]

    Putting network verification to good use,

    R. Beckett and R. Mahajan, “Putting network verification to good use,” inProc. of ACM HotNets ’19, Princeton, NJ, USA, November 13-15,

  8. [8]

    Real time network policy checking using header space analysis,

    P. Kazemian, M. Chang, H. Zeng, G. Varghese, and N. McKeown, “Real time network policy checking using header space analysis,” in Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Lombard, IL, USA, 2013

  9. [9]

    Accuracy, scalability, coverage: A practical configuration verifier on a global wan,

    F. Ye, D. Yu, E. Zhai, and H. H. e. a. Liu, “Accuracy, scalability, coverage: A practical configuration verifier on a global wan,” inProc. of ACM SIGCOMM 2020, ser. SIGCOMM ’20. New York, NY , USA: ACM, 2020, p. 599–614

  10. [10]

    Lfveri: Network configuration verification for virtual private cloud networks,

    K. Wang, C. Zhao, J. Chu, Y . Shi, J. Lu, B. Lyu, S. Zhu, P. Cheng, and J. Chen, “Lfveri: Network configuration verification for virtual private cloud networks,”IEEE/ACM Trans. Netw., vol. 32, no. 6, p. 5475–5490, Oct. 2024. [Online]. Available: https://doi.org/10.1109/ TNET.2024.3469386

  11. [11]

    Meissa: scalable network testing for programmable data planes,

    N. Zheng, M. Liu, and E. e. a. Zhai, “Meissa: scalable network testing for programmable data planes,” inProc. of ACM SIGCOMM 2022, ser. SIGCOMM ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 350–364

  12. [12]

    Aquila: a practically usable verification system for production-scale programmable data planes,

    B. Tian, J. Gao, and M. e. a. Liu, “Aquila: a practically usable verification system for production-scale programmable data planes,” in Proc. of ACM SIGCOMM 2021, ser. SIGCOMM ’21. New York, NY , USA: ACM, 2021, p. 17–32

  13. [13]

    Rao, Bruno Ribeiro, and Mohit Tawar- malani

    X. Xu, Y . Yuan, Z. Kincaid, A. Krishnamurthy, R. Mahajan, D. Walker, and E. Zhai, “Relational network verification,” inProceedings of the ACM SIGCOMM 2024 Conference, ser. ACM SIGCOMM ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 213–227. [Online]. Available: https://doi.org/10.1145/3651890.3672238

  14. [14]

    Efficient network configuration verification using optimized datalog,

    Y . Li, Z. Wang, and X. e. a. Yin, “Efficient network configuration verification using optimized datalog,” 04 2018, pp. 1–2

  15. [15]

    Towards accessible model-free verification,

    A. Krentsel, O. Ye, A. Tafoya, X. Ma, S. Ratnasamy, and A. Shaikh, “Towards accessible model-free verification,” inProceedings of the 24th ACM Workshop on Hot Topics in Networks (HotNets ’25). College Park, MD, USA: ACM, Nov. 2025. [Online]. Available: https: //conferences.sigcomm.org/hotnets/2025/papers/hotnets25-final13.pdf

  16. [16]

    Network change vali- dation with relational netkat,

    H. Xu, Z. Kincaid, R. Mahajan, and D. Walker, “Network change vali- dation with relational netkat,”Proceedings of the ACM on Programming Languages, vol. 10, no. POPL, pp. 384–412, 2026

  17. [17]

    Lessons from the evolution of the batfish configuration analysis tool,

    M. Brown, A. Fogel, D. Halperin, V . Heorhiadi, R. Mahajan, and T. Millstein, “Lessons from the evolution of the batfish configuration analysis tool,” inProceedings of the ACM SIGCOMM 2023 Conference, ser. ACM SIGCOMM ’23. New York, NY , USA: Association for Computing Machinery, 2023, pp. 122–135

  18. [18]

    The ns-3 network simulator,

    G. F. Riley and T. R. Henderson, “The ns-3 network simulator,” in Modeling and Tools for Network Simulation. Berlin, Heidelberg: Springer, 2010, pp. 15–34

  19. [19]

    An overview of the omnet++ simulation envi- ronment,

    A. Varga and R. Hornig, “An overview of the omnet++ simulation envi- ronment,”Proceedings of the 1st International Conference on Simulation Tools and Techniques (Simutools), pp. 1–10, 2008. 12

  20. [20]

    Routenet: Leveraging graph neural networks for network modeling and optimization in sdn,

    K. Rusek, J. Su ´arez-Varela, A. Mestres, P. Barlet-Ros, and A. Cabellos- Aparicio, “Routenet: Leveraging graph neural networks for network modeling and optimization in sdn,” inIEEE Journal on Selected Areas in Communications, vol. 38, no. 10. IEEE, 2020, pp. 2260–2270

  21. [21]

    Gns3 network emulator,

    GNS3 Development Team, “Gns3 network emulator,” https://www.gns3. com, 2025, graphical network simulator supporting real and virtual devices

  22. [22]

    Containerlab,

    R. Sukharev and Contributors, “Containerlab,” 2025. [Online]. Available: https://github.com/srl-labs/containerlab

  23. [23]

    Network change validation with relational netkat,

    H. Xu, Z. Kincaid, R. Mahajan, and D. Walker, “Network change validation with relational netkat,”Proc. ACM Program. Lang., vol. 10, no. POPL, Jan. 2026

  24. [24]

    A comprehensive survey of network digital twin architecture, capabilities, challenges, and requirements for edge-cloud continuum,

    S. M. Raza, R. Minerva, N. Crespi, M. Alvi, M. Herath, and H. Dutta, “A comprehensive survey of network digital twin architecture, capabilities, challenges, and requirements for edge-cloud continuum,”Computer Communications, 2025. [Online]. Available: https://hal.science/hal-04986834

  25. [25]

    Ietf nmog internet draft - simap: Concept, requirements, and use cases,

    O. Havel, B. Claise, O. Dios, and T. Graf, “Ietf nmog internet draft - simap: Concept, requirements, and use cases,” https://datatracker.ietf. org/doc/draft-ietf-nmop-simap-concept/, Oct 2025

  26. [26]

    Developing a concept of pan-european digital twin of the electricity system,

    Twin EU Consortium, “Developing a concept of pan-european digital twin of the electricity system,” https://twineu.net/, Jan 2024

  27. [27]

    Digital twin for decision intelligence (dt4di): From strategy to implementation,

    TM Forum, “Digital twin for decision intelligence (dt4di): From strategy to implementation,” TM Forum, Tech. Rep. IG1307, 2025. [Online]. Available: https://www.tmforum.org/resources/introductory-guide-whitepaper/ dt4di-from-strategy-to-implementation-v3-0-0-ig1307/

  28. [28]

    Kubeplaybook: A repository of ansible playbooks for kubernetes auto-remediation with llms,

    Z. Namrud, K. Sarda, and M. e. a. Litoiu, “Kubeplaybook: A repository of ansible playbooks for kubernetes auto-remediation with llms,” in Companion of the 15th ACM/SPEC International Conference on Per- formance Engineering, 2024, pp. 57–61

  29. [29]

    Ansible lightspeed: A code generation service for it automation,

    P. Sahoo, S. Pujar, G. Nalawade, R. Genhardt, L. Mandel, and L. Buratti, “Ansible lightspeed: A code generation service for it automation,” in Proc. of IEEE/ACM International Conference on Automated Software Engineering, 2024, pp. 2148–2158

  30. [30]

    What do llms need to synthesize correct router configurations?

    R. Mondal, A. Tang, R. Beckett, T. Millstein, and G. Varghese, “What do llms need to synthesize correct router configurations?” inProc. of ACM HotNets, Cambridge, MA, USA, November 28-29, 2023, 2023, pp. 189–195

  31. [31]

    Network verification 2.0,

    R. Beckett and R. Mahajan, “Network verification 2.0,” https://netverify. fun/network-verification-2-0/, dec 2020

  32. [32]

    ReAct: Synergizing Reasoning and Acting in Language Models

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

  33. [33]

    Diffy: Data-driven bug finding for configurations,

    S. K. R. Kakarla, F. Y . Yan, and R. Beckett, “Diffy: Data-driven bug finding for configurations,”Proceedings of the ACM on Programming Languages, vol. 8, no. PLDI, pp. 199–222, 2024

  34. [34]

    Openconfig data model,

    O. W. Group, “Openconfig data model,” https://www.openconfig.net/, accessed: 2025-10-06

  35. [35]

    Arangodb: The native multi-model database,

    ArangoDB GmbH, “Arangodb: The native multi-model database,” https: //www.arangodb.com, 2025, open-source native multi-model database supporting graph, document, and key–value data models

  36. [36]

    Rfc 6241: Network configuration protocol (netconf),

    R. Enns, M. Bjorklund, J. Schoenwaelder, and A. Bierman, “Rfc 6241: Network configuration protocol (netconf),” 2011

  37. [37]

    [Online]

    Cisco Systems, Inc.,Cisco Network Services Orchestrator (NSO) Documentation, Cisco Systems, Inc., 2024, version 6.x. [Online]. Available: https://developer.cisco.com/docs/nso/

  38. [38]

    Agentic ai foundation (aaif): Advancing agen- tic ai together

    Linux Foundation, “Agentic ai foundation (aaif): Advancing agen- tic ai together.” https://aaif.io/, Dec. 2025, aAIF establishes neutral, open governance for agentic AI standards including MCP, goose, and AGENTS.md

  39. [39]

    Agntcy project: Building infrastructure for the internet of agents

    ——, “Agntcy project: Building infrastructure for the internet of agents.” https://agntcy.org/, Jul. 2025

  40. [40]

    Agent-to-agent (a2a) protocol specification v1.0,

    ——, “Agent-to-agent (a2a) protocol specification v1.0,” 2025, accessed: 2026-02-02. [Online]. Available: https://a2a-protocol.org/ latest/specification/

  41. [41]

    Llamaindex: A data framework for llm applications,

    J. Liu and L. Contributors, “Llamaindex: A data framework for llm applications,” https://github.com/run-llama/llama index, 2023

  42. [42]

    Ietf draft-mpsb-agntcy-slim-00, se- cure low-latency interactive messaging,

    IETF AGNTCY Working Group, “Ietf draft-mpsb-agntcy-slim-00, se- cure low-latency interactive messaging,” https://datatracker.ietf.org/doc/ draft-mpsb-agntcy-slim/, 2024

  43. [43]

    Model context protocol (mcp) specification,

    Model Context Protocol Community, “Model context protocol (mcp) specification,” 2025, accessed: 2026-02-02. [Online]. Available: https://modelcontextprotocol.io/specification/

  44. [44]

    Understanding bgp mis- configuration,

    R. Mahajan, D. Wetherall, and T. Anderson, “Understanding bgp mis- configuration,” inProceedings of the 2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communica- tions (SIGCOMM ’02). New York, NY , USA: ACM, 2002, pp. 3–16

  45. [45]

    Why do internet services fail, and what can be done about it?

    D. Oppenheimer, A. Ganapathi, and D. A. Patterson, “Why do internet services fail, and what can be done about it?” inProceedings of the 4th USENIX Symposium on Internet Technologies and Systems (USITS ’03). USENIX Association, 2003, pp. 1–16

  46. [46]

    Deepeval: The llm evaluation framework,

    Confident AI, “Deepeval: The llm evaluation framework,” 2026, version 3.8.1. Open-source evaluation framework for LLMs. [Online]. Available: https://github.com/confident-ai/deepeval