pith. sign in
Pith Number

pith:AT7SNXJ2

pith:2022:AT7SNXJ2YUBDO47YGBKZXAWQHS
not attested not anchored not stored refs resolved

Prompt-to-Prompt Image Editing with Cross Attention Control

Amir Hertz, Daniel Cohen-Or, Jay Tenenbaum, Kfir Aberman, Ron Mokady, Yael Pritch

Cross-attention layers let users edit images by changing only the text prompt.

arxiv:2208.01626 v1 · 2022-08-02 · cs.CV · cs.CL · cs.GR · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{AT7SNXJ2YUBDO47YGBKZXAWQHS}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

the cross-attention layers are the key to controlling the relation between the spatial layout of the image to each word in the prompt. With this observation, we present several applications which monitor the image synthesis by editing the textual prompt only.

C2weakest assumption

That the cross-attention mechanism is the dominant and controllable factor for spatial word-to-region mapping in the underlying generative model, and that targeted edits to these maps during inference will not introduce artifacts or require model retraining.

C3one line summary

Cross-attention control in text-conditioned models enables localized and global image edits by editing only the input text prompt.

References

51 extracted · 51 resolved · 6 Pith anchors

[1] Image2stylegan: How to embed images into the stylegan latent space? In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 4432–4441 2019
[2] Clip2stylegan: Unsupervised extraction of stylegan edit directions 2021
[3] Hyperstyle: Stylegan inversion with hypernetworks for real image editing 2022
[4] arXiv preprint arXiv:2206.02779 , year= 2022
[5] Blended diffusion for text-driven editing of natural images 2022

Formal links

2 machine-checked theorem links

Cited by

152 papers in Pith

Receipt and verification
First computed 2026-07-05T04:45:35.578343Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

04ff26dd3ac5023773f830559b82d03cbb9b046433b0bcf6f9402b0a74893087

Aliases

arxiv: 2208.01626 · arxiv_version: 2208.01626v1 · doi: 10.48550/arxiv.2208.01626 · pith_short_12: AT7SNXJ2YUBD · pith_short_16: AT7SNXJ2YUBDO47Y · pith_short_8: AT7SNXJ2
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/AT7SNXJ2YUBDO47YGBKZXAWQHS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 04ff26dd3ac5023773f830559b82d03cbb9b046433b0bcf6f9402b0a74893087
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "21a051649f91096729f99e9a3accb638b7af913634eb3aa79930b28f7a40f2a6",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.GR",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2022-08-02T17:55:41Z",
    "title_canon_sha256": "6f045d47f77f9d62c080c9273555e654308291b07760a0742e4f3abcf0504773"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2208.01626",
    "kind": "arxiv",
    "version": 1
  }
}