pith:3FY2AOQX
BBQ: A Hand-Built Bias Benchmark for Question Answering
Question answering models rely on social stereotypes, showing higher accuracy when correct answers align with biases.
arxiv:2110.08193 v2 · 2021-10-15 · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3FY2AOQXS2OUDBRMFLBJI4NJPB}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Models often rely on stereotypes when the context is under-informative, meaning the model's outputs consistently reproduce harmful biases in this setting. Though models are more accurate when the context provides an informative answer, they still rely on stereotypes and average up to 3.4 percentage points higher accuracy when the correct answer aligns with a social bias than when it conflicts, with this difference widening to over 5 points on examples targeting gender for most models tested.
The hand-constructed questions accurately capture attested real-world social biases in U.S. English contexts without introducing artificial patterns that models exploit differently from natural text.
BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:46.772647Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d971a03a17969d41862c2ac29471a9785ed277cfbff812406694dd2deb4ed2e9
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3FY2AOQXS2OUDBRMFLBJI4NJPB \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d971a03a17969d41862c2ac29471a9785ed277cfbff812406694dd2deb4ed2e9
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "5680f90f58226f4faab044ff3bf59bd8a204c421364cd67b536c97895efb0746",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2021-10-15T16:43:46Z",
"title_canon_sha256": "b6cb76fc550908048a9e2f435ac156c2f5acb9ce960aeaea505b7cd0c834ae47"
},
"schema_version": "1.0",
"source": {
"id": "2110.08193",
"kind": "arxiv",
"version": 2
}
}