pith. sign in
Pith Number

pith:ROU4N4IR

pith:2023:ROU4N4IRYZ3HV3M7HKTK3YY643
not attested not anchored not stored refs resolved

Aligning Text-to-Image Models using Human Feedback

Craig Boutilier, Hao Liu, Kimin Lee, Mohammad Ghavamzadeh, Moonkyung Ryu, Olivia Watkins, Pieter Abbeel, Shixiang Shane Gu, Yuqing Du

Fine-tuning text-to-image models with human feedback improves accuracy on prompts specifying colors, counts, and backgrounds.

arxiv:2302.12192 v1 · 2023-02-23 · cs.LG · cs.AI · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ROU4N4IRYZ3HV3M7HKTK3YY643}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our method generates objects with specified colors, counts and backgrounds more accurately than the pre-trained model.

C2weakest assumption

Human feedback on image-text alignment is consistent enough to be captured by a learned reward function that generalizes to new prompts and does not introduce unintended biases during fine-tuning.

C3one line summary

A three-stage fine-tuning process uses human ratings to train a reward model and then improves text-to-image alignment by maximizing reward-weighted likelihood.

References

26 extracted · 26 resolved · 17 Pith anchors

[1] A General Language Assistant as a Laboratory for Alignment · arXiv:2112.00861
[2] arXiv preprint arXiv:1607.07086 , year= · arXiv:1607.07086
[3] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback 2005 · arXiv:2204.05862
[4] E., and Wang, W
[5] An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion · arXiv:2208.01618

Cited by

39 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:21.805215Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

8ba9c6f111c6767aed9f3aa6ade31ee6d7254c3ff5008fbdfaa60efc0bdb941f

Aliases

arxiv: 2302.12192 · arxiv_version: 2302.12192v1 · doi: 10.48550/arxiv.2302.12192 · pith_short_12: ROU4N4IRYZ3H · pith_short_16: ROU4N4IRYZ3HV3M7 · pith_short_8: ROU4N4IR
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ROU4N4IRYZ3HV3M7HKTK3YY643 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8ba9c6f111c6767aed9f3aa6ade31ee6d7254c3ff5008fbdfaa60efc0bdb941f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "2c26c5b752bc8facff7879651ec9d50c096133e39800c4894b64e41096fcbb67",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2023-02-23T17:34:53Z",
    "title_canon_sha256": "b9f9d0a75704678b4acde8885bfbea84f3cb70eca4a6315ff07a9f7d2fb3b1f0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2302.12192",
    "kind": "arxiv",
    "version": 1
  }
}