pith. sign in

← back to paper

Review history

arxiv: 2605.00817 · 2 revisions

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

  1. 2026-05-22 UNVERDICTED LOW v0.9.0 novelty 7.0
    40464 ms 5685 in 1082 out 2026-05-22T10:00:21.774910+00:00
  2. 2026-05-09 UNVERDICTED LOW v0.9.0 novelty 6.0
    39056 ms 5454 in 1101 out 2026-05-09T18:48:26.279158+00:00