pith. sign in

AgentBench: Evaluating

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 3

representative citing papers

ABRA: Agent Benchmark for Radiology Applications

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

ABRA shows radiology agents excel at tool execution (89%+) but struggle with outcomes (0-25%), with oracle perception raising outcomes to 69-100%, identifying perception as the primary bottleneck.

citing papers explorer

Showing 3 of 3 citing papers.