pith. sign in

← back to paper

Review history

arxiv: 2604.15774 · 2 revisions

MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents

  1. 2026-05-22 UNVERDICTED LOW v0.9.0 novelty 8.0
    25253 ms 5733 in 1268 out 2026-05-22T10:07:04.648524+00:00
  2. 2026-05-10 UNVERDICTED LOW v0.9.0 novelty 7.0
    17524 ms 5499 in 984 out 2026-05-10T08:33:04.427000+00:00