pith. sign in
Pith Number

pith:26B4PUA7

pith:2025:26B4PUA7HOATI4KNCFQ6OEC5U3
not attested not anchored not stored refs resolved

Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

Fanshi Li, Fei Liu, Guofeng Wu, Jianchao Yang, Jie Wu, Liang Li, Linjie Yang, Lixue Gong, Liyang Liu, Peng Wang, Qi Zhang, Shijia Zhao, Shiqi Sun, Weilin Huang, Wei Liu, Wei Lu, Xiaochen Lian, Xiaoxia Hou, Xin Xia, Xinyu Zhang, Xuefeng Xiao, Xun Wang, Ye Wang, Yichun Shi, Yu Tian, Yuwei Zhang, Zhi Tian, Zhonghua Zhai

Seedream 2.0 uses a self-developed bilingual LLM text encoder to generate high-fidelity images from Chinese or English prompts with accurate cultural nuances.

arxiv:2503.07703 v1 · 2025-03-10 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{26B4PUA7HOATI4KNCFQ6OEC5U3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Through extensive experimentation, we demonstrate that Seedream 2.0 achieves state-of-the-art performance across multiple aspects, including prompt-following, aesthetics, text rendering, and structural correctness. Furthermore, Seedream 2.0 has been optimized through multiple RLHF iterations to closely align its output with human preferences, as revealed by its outstanding ELO score.

C2weakest assumption

That the self-developed bilingual LLM text encoder and the custom data/caption systems allow the model to learn native Chinese knowledge directly from data without introducing new biases or requiring post-hoc fixes that undermine the claimed native performance.

C3one line summary

Seedream 2.0 is a native Chinese-English bilingual diffusion model that integrates a self-developed LLM text encoder, Glyph-Aligned ByT5, and Scaled ROPE to reach claimed state-of-the-art results in prompt following, aesthetics, text rendering, and human preference alignment via RLHF.

References

44 extracted · 44 resolved · 8 Pith anchors

[1] Training Diffusion Models with Reinforcement Learning 2023 · arXiv:2305.13301
[2] Instructpix2pix: Learning to follow image editing instructions 2023
[3] Masactrl: Tuning- free mutual self-attention control for consistent image synthesis and editing 2023
[4] Textdiffuser-2: Unleashing the power of language models for text rendering 2024
[5] Altclip: Altering the lan- guage encoder in clip for extended language capabilities 2022

Formal links

2 machine-checked theorem links

Cited by

19 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.574025Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d783c7d01f3b8134714d1161e7105da6f90b842d6842ffd0c6fb1f0d3a02c1c2

Aliases

arxiv: 2503.07703 · arxiv_version: 2503.07703v1 · doi: 10.48550/arxiv.2503.07703 · pith_short_12: 26B4PUA7HOAT · pith_short_16: 26B4PUA7HOATI4KN · pith_short_8: 26B4PUA7
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/26B4PUA7HOATI4KNCFQ6OEC5U3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d783c7d01f3b8134714d1161e7105da6f90b842d6842ffd0c6fb1f0d3a02c1c2
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "32a97c3ea5cc16f3df4e2381704c161cc3204d1b3a91a392238e59cbed760f09",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-03-10T17:58:33Z",
    "title_canon_sha256": "92d43a42b4d765aa91d70be74acd529bec7900888860943c5bd4fbeb07ee27cf"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2503.07703",
    "kind": "arxiv",
    "version": 1
  }
}