pith:MMLQUVCX
RePack then Refine: Efficient Diffusion Transformer with Vision Foundation Model
Compressing VFM features to a low-dimensional manifold lets DiTs reach FID 1.82 on ImageNet in 64 epochs, then a refiner improves it to 1.65.
arxiv:2512.12083 v3 · 2025-12-12 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MMLQUVCXA32J35G5AJ2JE6TDDK}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
On ImageNet-1K, RePack-DiT-XL/1 achieves an FID of 1.82 in only 64 training epochs. With the Refiner module, performance further improves to an FID of 1.65, significantly surpassing latest LDMs in terms of convergence efficiency.
That the RePack projection to a compact manifold preserves essential structural information from VFM features such that the subsequent Latent-Guided Refiner can reliably restore high-frequency details without introducing artifacts or requiring extensive additional training.
RePack projects VFM features to a low-dimensional manifold for efficient DiT training, followed by a Latent-Guided Refiner that improves FID to 1.65 on ImageNet-1K after 64 epochs.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:16.861851Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
63170a545706f49df4dd0274927a631abed14e8fa86ae28ac5628f4917a3dc5e
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MMLQUVCXA32J35G5AJ2JE6TDDK \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 63170a545706f49df4dd0274927a631abed14e8fa86ae28ac5628f4917a3dc5e
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "4d115b9aff5639ffffaf7dc8c8ebe275a1cbdc58caec78d0b510bfb5c891269e",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/publicdomain/zero/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2025-12-12T23:17:41Z",
"title_canon_sha256": "0645f829babe1a72977dd2b28277921e7c0b124ae6a1f07e5ef5f173b8f2c218"
},
"schema_version": "1.0",
"source": {
"id": "2512.12083",
"kind": "arxiv",
"version": 3
}
}