Nissist: An incident mitigation copi- lot based on troubleshooting guides

Kaikai An, Fangkai Yang, Junting Lu, Liqun Li, Zhixing Ren, Hao Huang, Lu Wang, Pu Zhao, Yu Kang, Hua Ding, et al · 2024 · arXiv 2402.17531

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

StepFly: Agentic Troubleshooting Guide Automation for Incident Diagnosis

cs.AI · 2025-10-11 · conditional · novelty 7.0

StepFly automates TSG execution via TSG Mentor, LLM-based DAG extraction with QPPs, and a DAG-guided parallel scheduler, reaching 94% success on GPT-4.1 with 32.9-70.4% time savings on parallelizable guides.

TSGuard: Automated User-Centric Incident Diagnosis for AI Workloads in the Cloud

cs.SE · 2025-06-02 · unverdicted · novelty 5.0

TSGuard builds domain knowledge bases offline from historical incidents and applies online multi-agent structured reasoning to diagnose AI workload failures, delivering 19.8% higher accuracy and 63.4% lower verification time than baselines on Azure production data.

An End-to-End Framework for Building Large Language Models for Software Operations

cs.LG · 2026-04-06 · unverdicted · novelty 4.0 · 2 refs

OpsLLM is a domain-specific LLM for software ops QA and RCA built with human-curated data, SFT, and RL using a domain process reward model, showing accuracy gains of 0.2-5.7% on QA and 2.7-70.3% on RCA over general LLMs.

citing papers explorer

Showing 3 of 3 citing papers.

StepFly: Agentic Troubleshooting Guide Automation for Incident Diagnosis cs.AI · 2025-10-11 · conditional · none · ref 2
StepFly automates TSG execution via TSG Mentor, LLM-based DAG extraction with QPPs, and a DAG-guided parallel scheduler, reaching 94% success on GPT-4.1 with 32.9-70.4% time savings on parallelizable guides.
TSGuard: Automated User-Centric Incident Diagnosis for AI Workloads in the Cloud cs.SE · 2025-06-02 · unverdicted · none · ref 5
TSGuard builds domain knowledge bases offline from historical incidents and applies online multi-agent structured reasoning to diagnose AI workload failures, delivering 19.8% higher accuracy and 63.4% lower verification time than baselines on Azure production data.
An End-to-End Framework for Building Large Language Models for Software Operations cs.LG · 2026-04-06 · unverdicted · none · ref 4 · 2 links
OpsLLM is a domain-specific LLM for software ops QA and RCA built with human-curated data, SFT, and RL using a domain process reward model, showing accuracy gains of 0.2-5.7% on QA and 2.7-70.3% on RCA over general LLMs.

Nissist: An incident mitigation copi- lot based on troubleshooting guides

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer