pith. sign in
Pith Number

pith:3FY2AOQX

pith:2021:3FY2AOQXS2OUDBRMFLBJI4NJPB
not attested not anchored not stored refs resolved

BBQ: A Hand-Built Bias Benchmark for Question Answering

Alicia Parrish, Angelica Chen, Jana Thompson, Jason Phang, Nikita Nangia, Phu Mon Htut, Samuel R. Bowman, Vishakh Padmakumar

Question answering models rely on social stereotypes, showing higher accuracy when correct answers align with biases.

arxiv:2110.08193 v2 · 2021-10-15 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3FY2AOQXS2OUDBRMFLBJI4NJPB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Models often rely on stereotypes when the context is under-informative, meaning the model's outputs consistently reproduce harmful biases in this setting. Though models are more accurate when the context provides an informative answer, they still rely on stereotypes and average up to 3.4 percentage points higher accuracy when the correct answer aligns with a social bias than when it conflicts, with this difference widening to over 5 points on examples targeting gender for most models tested.

C2weakest assumption

The hand-constructed questions accurately capture attested real-world social biases in U.S. English contexts without introducing artificial patterns that models exploit differently from natural text.

C3one line summary

BBQ is a new benchmark dataset showing that QA models often default to social stereotypes, achieving up to 3.4 points higher accuracy when the correct answer aligns with bias.

References

65 extracted · 65 resolved · 3 Pith anchors

[1] Kevin Bartz. 2009. https://blogs.iq.harvard.edu/english_first_n English first names for chinese americans . Harvard University Social Science Statistics Blog. Accessed July 2021 2009
[2] Su Lin Blodgett, Solon Barocas, Hal Daum \'e III, and Hanna Wallach. 2020. https://aclanthology.org/2020.acl-main.485/ Language (technology) is power: A critical survey of" bias" in NLP . In Proceedin 2020
[5] Semantics derived automatically from language corpora contain human-like biases 2017 · doi:10.1126/science.aal4230
[7] Jorida Cila, Richard N Lalonde, Joni Y Sasaki, Raymond A Mar, and Ronda F Lo. 2021. https://psycnet.apa.org/fulltext/2020-69298-001.html Zahra or Zoe , Arjun or Andrew ? Bicultural baby names reflect 2021
[8] Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge 2018 · arXiv:1803.05457

Formal links

2 machine-checked theorem links

Cited by

24 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.772647Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d971a03a17969d41862c2ac29471a9785ed277cfbff812406694dd2deb4ed2e9

Aliases

arxiv: 2110.08193 · arxiv_version: 2110.08193v2 · doi: 10.48550/arxiv.2110.08193 · pith_short_12: 3FY2AOQXS2OU · pith_short_16: 3FY2AOQXS2OUDBRM · pith_short_8: 3FY2AOQX
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3FY2AOQXS2OUDBRMFLBJI4NJPB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d971a03a17969d41862c2ac29471a9785ed277cfbff812406694dd2deb4ed2e9
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5680f90f58226f4faab044ff3bf59bd8a204c421364cd67b536c97895efb0746",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2021-10-15T16:43:46Z",
    "title_canon_sha256": "b6cb76fc550908048a9e2f435ac156c2f5acb9ce960aeaea505b7cd0c834ae47"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2110.08193",
    "kind": "arxiv",
    "version": 2
  }
}