OpenRCA 2.0 is the first cross-system RCA benchmark with step-wise causal annotations, revealing that 11 frontier LLMs achieve 20.7% exact root-cause recovery and struggle with causal grounding (61.5% vs 76.0% ungrounded).
Stratus: A multi-agent system for autonomous reliability engineering of modern clouds.arXiv preprint arXiv:2506.02009v2, 05 2025
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
A three-agent system with a microkernel and typed ISA enables safe parallel microreboots in microservices by inferring recovery boundaries from traces, achieving zero agent-caused harm in online tests.
PRAXIS combines LLM-driven structured traversal of service dependency graphs and hammock-block program dependence graphs to improve root-cause analysis accuracy by up to 6.3x while cutting token consumption by 5.3x on 30 real-world cloud incidents.
The paper introduces Experiment-as-Code Labs as a declarative stack synthesizing AI agents, systems orchestration, and physical lab control for AI-driven discovery.
Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production deployment.
citing papers explorer
-
OpenRCA 2.0: From Outcome Labels to Causal Process Supervision
OpenRCA 2.0 is the first cross-system RCA benchmark with step-wise causal annotations, revealing that 11 frontier LLMs achieve 20.7% exact root-cause recovery and struggle with causal grounding (61.5% vs 76.0% ungrounded).
-
Rebooting Microreboot: Architectural Support for Safe, Parallel Recovery in Microservice Systems
A three-agent system with a microkernel and typed ISA enables safe parallel microreboots in microservices by inferring recovery boundaries from traces, achieving zero agent-caused harm in online tests.
-
PRAXIS: Integrating Program Analysis with Observability for Root-Cause Analysis
PRAXIS combines LLM-driven structured traversal of service dependency graphs and hammock-block program dependence graphs to improve root-cause analysis accuracy by up to 6.3x while cutting token consumption by 5.3x on 30 real-world cloud incidents.
-
Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery
The paper introduces Experiment-as-Code Labs as a declarative stack synthesizing AI agents, systems orchestration, and physical lab control for AI-driven discovery.
-
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production deployment.