pith. sign in

← back to paper

Review history

arxiv: 2605.22664 · 2 revisions

MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

  1. 2026-06-30 UNVERDICTED LOW v0.9.1-grok novelty 7.0
    53122 ms 5803 in 1442 out 2026-06-30T17:09:47.093516+00:00
  2. 2026-05-22 UNVERDICTED LOW v0.9.0 novelty 7.0
    59916 ms 5803 in 1441 out 2026-05-22T05:25:21.686697+00:00