pith. sign in
Pith Number

pith:JP4GN6WR

pith:2023:JP4GN6WR2H4HK2JEYMBDM2W7JL
not attested not anchored not stored refs resolved

Adding Conditional Control to Text-to-Image Diffusion Models

Anyi Rao, Lvmin Zhang, Maneesh Agrawala

ControlNet adds spatial controls like edges, depth, and human poses to pretrained text-to-image diffusion models.

arxiv:2302.05543 v3 · 2023-02-10 · cs.CV · cs.AI · cs.GR · cs.HC · cs.MM

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JP4GN6WR2H4HK2JEYMBDM2W7JL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls.

C2weakest assumption

The zero convolutions progressively grow parameters from zero and ensure that no harmful noise could affect the finetuning, allowing the pretrained backbone to remain intact while learning new controls.

C3one line summary

ControlNet adds spatial conditioning controls to pretrained text-to-image diffusion models via zero convolutions for stable fine-tuning on small or large datasets.

References

99 extracted · 99 resolved · 6 Pith anchors

[1] Weight initialization in neural network, inspired by andrew ng, https://medium.com/@safrin1128/weight- initialization-in-neural-network-inspired-by-andrew-ng- e0066dc4a566, 2020 2020
[2] In- trinsic dimensionality explains the effectiveness of language model fine-tuning 2021
[3] Only a matter of style: Age transformation using a style-based regression model 2021
[4] Hyperstyle: Stylegan inversion with hypernetworks for real image editing 2022
[5] Disco diffusion, https://github.com/alembics/disco- diffusion, 2022 2022

Formal links

2 machine-checked theorem links

Cited by

24 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.350422Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

4bf866fad1d1f8756924c302366adf4ad0805b0a40be1eb3b7ca51069681aa14

Aliases

arxiv: 2302.05543 · arxiv_version: 2302.05543v3 · doi: 10.48550/arxiv.2302.05543 · pith_short_12: JP4GN6WR2H4H · pith_short_16: JP4GN6WR2H4HK2JE · pith_short_8: JP4GN6WR
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JP4GN6WR2H4HK2JEYMBDM2W7JL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4bf866fad1d1f8756924c302366adf4ad0805b0a40be1eb3b7ca51069681aa14
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "354cfba2220db73fe22bf3e76b365043fbef9ec456ed18ece6e21b9722bac2d5",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.GR",
      "cs.HC",
      "cs.MM"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-02-10T23:12:37Z",
    "title_canon_sha256": "c802ad03716e6f3260d24a90dcef2626bcb68f2f1371aaab7f3fa8d320eb8edb"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2302.05543",
    "kind": "arxiv",
    "version": 3
  }
}