pith. machine review for the scientific record. sign in

← back to paper

Review history

arxiv: 2605.07161 · 2 revisions

SREGym: A Live Benchmark for AI SRE Agents with High-Fidelity Failure Scenarios

  1. 2026-05-14 UNVERDICTED LOW v0.9.0 novelty 7.0
    45283 ms 5532 in 1094 out 2026-05-14T21:44:20.410285+00:00
  2. 2026-05-11 UNVERDICTED LOW v0.9.0 novelty 5.0
    29518 ms 5532 in 1157 out 2026-05-11T02:17:33.146837+00:00