Cross-attention control in text-conditioned models enables localized and global image edits by editing only the input text prompt.
Taming transformers for high-resolution image synthesis
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
method 1
citation-polarity summary
verdicts
UNVERDICTED 2representative citing papers
GR-2 pre-trains on web-scale videos then fine-tunes on robot data to reach 97.7% average success across over 100 manipulation tasks with strong generalization to new scenes and objects.
citing papers explorer
-
Prompt-to-Prompt Image Editing with Cross Attention Control
Cross-attention control in text-conditioned models enables localized and global image edits by editing only the input text prompt.
-
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
GR-2 pre-trains on web-scale videos then fine-tunes on robot data to reach 97.7% average success across over 100 manipulation tasks with strong generalization to new scenes and objects.