pith. machine review for the scientific record. sign in

← back to paper

Review history

arxiv: 2605.06856 · 2 revisions

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

  1. 2026-05-12 UNVERDICTED LOW v0.9.0 novelty 4.0
    71658 ms 5533 in 1627 out 2026-05-12T04:27:00.871862+00:00
  2. 2026-05-11 UNVERDICTED LOW v0.9.0 novelty 5.0
    46540 ms 5533 in 1528 out 2026-05-11T01:17:34.111963+00:00