pith. sign in
Pith Number

pith:MTHKQKRG

pith:2022:MTHKQKRGMSXMB7W4XJJO5C66RV
not attested not anchored not stored refs resolved

Improving alignment of dialogue agents via targeted human judgements

Abigail See, Amelia Glaese, Boxi Wu, Charlie Chen, Demis Hassabis, Doug Fritz, Fan Yang, Geoffrey Irving, Iason Gabriel, Jaume Sanchez Elias, John Aslanides, John Mellor, Jonathan Uesato, Koray Kavukcuoglu, Laura Weidinger, Lisa Anne Hendricks, Lucy Campbell-Gillingham, Maja Tr\k{e}bacz, Maribeth Rauh, Martin Chadwick, Nat McAleese, Nicholas Fernando, Phoebe Thacker, Po-Sen Huang, Rachel Foley, Ramona Comanescu, Richard Green, Rory Greig, So\v{n}a Mokr\'a, Sumanth Dathathri, Susannah Young, Timo Ewalds, Vlad Firoiu, William Isaac

Sparrow dialogue agent uses separate human judgments on natural language rules and evidence citations to outperform baselines in preference and safety.

arxiv:2209.14375 v1 · 2022-09-28 · cs.LG · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MTHKQKRGMSXMB7W4XJJO5C66RV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Sparrow is preferred more often than baselines while being more resilient to adversarial probing by humans, violating our rules only 8% of the time when probed.

C2weakest assumption

That separate human judgments on the listed natural language rules reliably capture the intended notions of helpfulness, correctness, and harmlessness without introducing new biases or inconsistencies.

C3one line summary

Sparrow uses targeted rule-based human feedback and evidence provision to outperform baselines in preference while violating rules only 8% of the time under adversarial probing.

References

15 extracted · 15 resolved · 4 Pith anchors

[1] Supervising strong learners by amplifying weak experts 1901 · doi:10.18653/v1/d19-1176
[2] doi: 10.18653/v1/2021.emnlp-main.444 2021 · doi:10.18653/v1/2021.emnlp-main.444
[3] Adam: A Method for Stochastic Optimization 2022 · arXiv:1412.6980
[4] URL https://arxiv.org/abs/2203.05115. M. K. Lee, D. Kusbit, A. Kahng, J. T. Kim, X. Yuan, A. Chan, D. See, R. Noothigattu, S. Lee, A. Psomas, and A. D. Procaccia. WeBuildAI: Participatory framework fo 2019 · doi:10.1145/3359283
[5] WebGPT: Browser-assisted question-answering with human feedback 2021 · doi:10.18653/v1/d19-1244

Formal links

2 machine-checked theorem links

Cited by

41 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:22.267587Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

64cea82a2664aec0fedcba52ee8bde8d4838e779018cbd75f2e3cf54ad55cfff

Aliases

arxiv: 2209.14375 · arxiv_version: 2209.14375v1 · doi: 10.48550/arxiv.2209.14375 · pith_short_12: MTHKQKRGMSXM · pith_short_16: MTHKQKRGMSXMB7W4 · pith_short_8: MTHKQKRG
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MTHKQKRGMSXMB7W4XJJO5C66RV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 64cea82a2664aec0fedcba52ee8bde8d4838e779018cbd75f2e3cf54ad55cfff
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "04e14b3529d962d858e47fc8446a8e656662bf3ecfafa19e8b3eedf48831a137",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2022-09-28T19:04:43Z",
    "title_canon_sha256": "d2be7c9a82e3da3903cadd379c977e7e799ce3051361fe1ce7286e483432fd5b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2209.14375",
    "kind": "arxiv",
    "version": 1
  }
}