pith:4Q5XUAZ3
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Decoupling the visual encoder into separate pathways lets a single transformer handle both multimodal understanding and generation without performance trade-offs.
arxiv:2410.13848 v1 · 2024-10-17 · cs.CV · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4Q5XUAZ334OCLVGJRTDYNWGGXY}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Experiments show that Janus surpasses previous unified model and matches or exceeds the performance of task-specific models.
That the conflict arising from differing information granularity in understanding versus generation is the main performance bottleneck and that decoupling the encoders will resolve it without introducing new integration problems in the shared transformer.
Janus decouples visual encoding into task-specific pathways inside a single autoregressive transformer to unify multimodal understanding and generation while outperforming earlier unified models.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:49.991947Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e43b7a033bdf1c25d4c98cc786d8c6be3f0a249cee13c37b6adeec1632aa6689
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4Q5XUAZ334OCLVGJRTDYNWGGXY \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e43b7a033bdf1c25d4c98cc786d8c6be3f0a249cee13c37b6adeec1632aa6689
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "ccac9affd22a03abd24c36a6f1dbfd031a9ff1e4871892b274524fd558580680",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-10-17T17:58:37Z",
"title_canon_sha256": "5fdd467fda2d0c8f5f6d4b978c5f95603eb588043dd172c2860ea2b4301e3512"
},
"schema_version": "1.0",
"source": {
"id": "2410.13848",
"kind": "arxiv",
"version": 1
}
}