pith. machine review for the scientific record. sign in
Pith Number

pith:3T7I6SRL

pith:2023:3T7I6SRLYYQSMS2HDBIITTL7EY
not attested not anchored not stored refs resolved

UltraFeedback: Boosting Language Models with Scaled AI Feedback

Bingxiang He, Ganqu Cui, Guanming Yao, Guotong Xie, Lifan Yuan, Maosong Sun, Ning Ding, Ruobing Xie, Wei Zhu, Yankai Lin, Yuan Ni, Zhiyuan Liu

A dataset of over one million GPT-4 feedbacks enables effective alignment of LLaMA-based chat models.

arxiv:2310.01377 v2 · 2023-10-02 · cs.CL · cs.AI · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Built upon UltraFeedback, we align a LLaMA-based model by best-of-n sampling and reinforcement learning, demonstrating its exceptional performance on chat benchmarks.

C2weakest assumption

That the series of techniques applied to mitigate annotation biases in GPT-4 feedback produces sufficiently reliable and unbiased signals for effective model alignment.

C3one line summary

UltraFeedback is a large-scale AI feedback dataset that enables effective alignment of open-source language models, yielding strong results on chat benchmarks.

References

14 extracted · 14 resolved · 2 Pith anchors

[1] Evaluating Large Language Models Trained on Code 2021 · doi:10.5281/zenodo.5371628
[2] doi: 10.18653/v1/ 2024.findings-acl.586 2023 · doi:10.18653/v1/
[3] Self-critiquing models for assisting human evaluators 2022 · doi:10.48550/arxiv
[4] This may be particularly helpful if you have a busy schedule and may not have time to take them later in the day
[5] Taking a vitamin D supplement after spending time outdoors can help boost your levels and ensure you’re getting enough

Formal links

2 machine-checked theorem links

Cited by

22 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.586464Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

dcfe8f4a2bc621264b47185089cd7f26248b30c7e9609908cb419894e397dff4

Aliases

arxiv: 2310.01377 · arxiv_version: 2310.01377v2 · doi: 10.48550/arxiv.2310.01377 · pith_short_12: 3T7I6SRLYYQS · pith_short_16: 3T7I6SRLYYQSMS2H · pith_short_8: 3T7I6SRL
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3T7I6SRLYYQSMS2HDBIITTL7EY \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: dcfe8f4a2bc621264b47185089cd7f26248b30c7e9609908cb419894e397dff4
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1d36aa47da97202909f564bcf2fd99c5f68f7c70f1de301a52bfcd55c832cdff",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by-sa/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-10-02T17:40:01Z",
    "title_canon_sha256": "b7b8be285286f3dd7d47544a7033add9fc57876b36c4cf43b92d8ac8f1cd2f66"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.01377",
    "kind": "arxiv",
    "version": 2
  }
}