← back to paper
arxiv: 2605.07161 · 2 revisions
SREGym: A Live Benchmark for AI SRE Agents with High-Fidelity Failure Scenarios