pith. sign in

← back to paper

Review history

arxiv: 2606.02380 · 2 revisions

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

  1. 2026-06-30 UNVERDICTED LOW v0.9.1-grok novelty 6.0
    26685 ms 5753 in 1087 out 2026-06-30T10:43:13.146546+00:00
  2. 2026-06-28 UNVERDICTED LOW v0.9.1-grok novelty 6.0
    20422 ms 5753 in 1302 out 2026-06-28T14:37:51.639069+00:00