ABC-Bench evaluates LLM agents on three biosecurity-relevant biology tasks and reports that agents outperformed median human experts, with wet-lab confirmation of successful DNA assembly by one model.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity
ABC-Bench evaluates LLM agents on three biosecurity-relevant biology tasks and reports that agents outperformed median human experts, with wet-lab confirmation of successful DNA assembly by one model.