pith. machine review for the scientific record. sign in

← back to paper

Review history

arxiv: 2605.08828 · 2 revisions

When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents

  1. 2026-05-13 UNVERDICTED LOW v0.9.0 novelty 6.0
    67778 ms 5577 in 1435 out 2026-05-13T06:56:05.044775+00:00
  2. 2026-05-12 UNVERDICTED LOW v0.9.0 novelty 6.0
    45176 ms 5577 in 1201 out 2026-05-12T02:57:29.632605+00:00