pith. sign in

← back to paper

Review history

arxiv: 2605.14133 · 2 revisions

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

  1. 2026-05-20 UNVERDICTED LOW v0.9.0 novelty 7.0
    43865 ms 5784 in 1273 out 2026-05-20T20:14:27.781613+00:00
  2. 2026-05-15 CONDITIONAL MODERATE v0.9.0 novelty 7.0
    35557 ms 5553 in 1138 out 2026-05-15T05:04:49.987617+00:00