LAB-Bench provides over 2,400 multiple-choice questions to measure LLM performance on real biology research tasks like literature recall, figure reading, database access, and sequence manipulation, with initial results compared against human expert biologists.
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2024 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
LAB-Bench: Measuring Capabilities of Language Models for Biology Research
LAB-Bench provides over 2,400 multiple-choice questions to measure LLM performance on real biology research tasks like literature recall, figure reading, database access, and sequence manipulation, with initial results compared against human expert biologists.