pith. sign in

← back to paper

Review history

arxiv: 2605.17439 · 2 revisions

DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

  1. 2026-05-20 UNVERDICTED LOW v0.9.0 novelty 6.0
    44846 ms 5829 in 1307 out 2026-05-20T12:59:54.123953+00:00
  2. 2026-05-19 CONDITIONAL LOW v0.9.0 novelty 7.0
    26050 ms 5829 in 1195 out 2026-05-19T23:06:42.504265+00:00