pith. sign in
Pith Number

pith:N4CBQLK5

pith:2024:N4CBQLK5SGMGON3BE2FVVZUSML
not attested not anchored not stored refs resolved

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Chao Zhang, Chen Zhang, Dayou Chen, Di Wang, Dongdong Wang, Jiabin Huang, Jiahao Li, Jiajun He, Jianchen Zhu, Jiangfeng Xiong, Jianwei Zhang, Jianxiang Lu, Jie Jiang, Jie Liu, Jihong Zhang, Jinbao Xue, Kai Liu, Meng Chen, Minbin Huang, Mingtao Chen, Qinglin Lu, Qin Lin, Rongwei Quan, Sihuan Lin, Wei Liu, Weiyan Wang, Wenyue Li, Xiao Xiao, Xiaoxiao Zheng, Xiaoyan Yuan, Xinchi Deng, Xingchao Liu, Yan Chen, Yangyu Tao, Yanxin Long, Yifu Sun, Yingfang Zhang, Yixuan Li, Yong Yang, Yuhong Liu, Yun Li, Zedong Xiao, Zheng Fang, Zhichao Hu, Zhimin Li

Hunyuan-DiT is a diffusion transformer that generates images from Chinese text with state-of-the-art detail through custom architecture and refined data handling.

arxiv:2405.08748 v1 · 2024-05-14 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{N4CBQLK5SGMGON3BE2FVVZUSML}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models.

C2weakest assumption

The human evaluation protocol with 50+ evaluators fairly measures fine-grained Chinese understanding without bias from prompt selection, evaluator background, or post-hoc comparison choices.

C3one line summary

Hunyuan-DiT is a new multi-resolution diffusion transformer that achieves state-of-the-art Chinese text-to-image generation through custom architecture, data pipelines, and multimodal caption refinement.

References

41 extracted · 41 resolved · 6 Pith anchors

[1] https://www.midjourney.com/home
[2] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 2023 · arXiv:2308.12966
[3] eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers 2022 · arXiv:2211.01324
[4] All are worth words: A vit backbone for diffusion models 2023
[5] Improving image generation with better captions 2023

Formal links

1 machine-checked theorem link

Cited by

39 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.512816Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

6f04182d5d9198673761268b5ae69262cff80490a194e1d9a7ebe5dbb5ab19b7

Aliases

arxiv: 2405.08748 · arxiv_version: 2405.08748v1 · doi: 10.48550/arxiv.2405.08748 · pith_short_12: N4CBQLK5SGMG · pith_short_16: N4CBQLK5SGMGON3B · pith_short_8: N4CBQLK5
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/N4CBQLK5SGMGON3BE2FVVZUSML \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6f04182d5d9198673761268b5ae69262cff80490a194e1d9a7ebe5dbb5ab19b7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "608279be699ea2ea47c30b3d92f93c2d0c6b6602af234c4513d133a411f0fd80",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-05-14T16:33:25Z",
    "title_canon_sha256": "afbd35cf94824e1120a5343405c22f755b9c1d221d72612be2c8eac55caee103"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2405.08748",
    "kind": "arxiv",
    "version": 1
  }
}