pith. sign in
Pith Number

pith:JQV66JLA

pith:2024:JQV66JLA35IBZFBTFZ3NRYLFR7
not attested not anchored not stored refs resolved

LAB-Bench: Measuring Capabilities of Language Models for Biology Research

Andrew D. White, Jon M. Laurent, Joseph D. Janizek, Manvitha Ponnapati, Michaela M. Hinks, Michael J. Hammerling, Michael Ruzo, Samuel G. Rodriques, Siddharth Narayanan

LAB-Bench introduces over 2,400 questions to test AI on practical biology research tasks such as literature search and sequence manipulation.

arxiv:2407.10362 v3 · 2024-07-14 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JQV66JLA35IBZFBTFZ3NRYLFR7}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

An AI system that can achieve consistently high scores on the more difficult LAB-Bench tasks would serve as a useful assistant for researchers in areas such as literature search and molecular cloning.

C2weakest assumption

The multiple-choice questions in LAB-Bench accurately reflect the practical capabilities required for real-world biology research tasks, rather than testing only surface-level pattern matching.

C3one line summary

LAB-Bench provides over 2,400 multiple-choice questions to measure LLM performance on real biology research tasks like literature recall, figure reading, database access, and sequence manipulation, with initial results compared against human expert biologists.

References

59 extracted · 59 resolved · 3 Pith anchors

[1] Joanna S Amberger, Carol A Bocchini, François Schiettecatte, Alan F Scott, and Ada Hamosh. Omim. org: Online mendelian inheritance in man (omim®), an online catalog of human genes and genetic disorder 2015
[2] Introducing the next generation of claude, March 2024 2024
[3] Introducing the next generation of claude, March 2024 2024
[4] Lessons from the Trenches on Reproducible Evaluation of Language Models 2024 · arXiv:2405.14782
[5] Autonomous chemical research with large language models 2023 · doi:10.1038/s41586-023-06792-0

Formal links

3 machine-checked theorem links

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.379162Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4c2bef2560df501c94332e76d8e1658fe77d63b751508f4118a5d4e623f63c80

Aliases

arxiv: 2407.10362 · arxiv_version: 2407.10362v3 · doi: 10.48550/arxiv.2407.10362 · pith_short_12: JQV66JLA35IB · pith_short_16: JQV66JLA35IBZFBT · pith_short_8: JQV66JLA
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JQV66JLA35IBZFBTFZ3NRYLFR7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4c2bef2560df501c94332e76d8e1658fe77d63b751508f4118a5d4e623f63c80
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "93987a7bf6ec82cff30bd36782bcb0930d5cc6ddbca93afde4947e0547ac096e",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2024-07-14T23:52:25Z",
    "title_canon_sha256": "e1e688186ac8a564ee4148b596d36e6270602308f20fb0f5c063dad5750372a3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2407.10362",
    "kind": "arxiv",
    "version": 3
  }
}