GSAR is a grounding-evaluation framework for multi-agent LLMs that uses a four-way claim typology, evidence-weighted asymmetric scoring, and tiered recovery decisions to detect and mitigate hallucinations.
Aiopslab: A holistic framework to evaluate ai agents for enabling autonomous clouds, 2025b
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Ambig-IaC detects structural disagreements in LLM-generated IaC candidates across three hierarchical axes to produce clarification questions, improving structure and attribute accuracy by 18.4% and 25.4% on a new 300-task benchmark.
OpsLLM is a domain-specific LLM for software ops QA and RCA built with human-curated data, SFT, and RL using a domain process reward model, showing accuracy gains of 0.2-5.7% on QA and 2.7-70.3% on RCA over general LLMs.
citing papers explorer
-
GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs
GSAR is a grounding-evaluation framework for multi-agent LLMs that uses a four-way claim typology, evidence-weighted asymmetric scoring, and tiered recovery decisions to detect and mitigate hallucinations.
-
Ambig-IaC: Multi-level Disambiguation for Interactive Cloud Infrastructure-as-Code Synthesis
Ambig-IaC detects structural disagreements in LLM-generated IaC candidates across three hierarchical axes to produce clarification questions, improving structure and attribute accuracy by 18.4% and 25.4% on a new 300-task benchmark.
-
An End-to-End Framework for Building Large Language Models for Software Operations
OpsLLM is a domain-specific LLM for software ops QA and RCA built with human-curated data, SFT, and RL using a domain process reward model, showing accuracy gains of 0.2-5.7% on QA and 2.7-70.3% on RCA over general LLMs.