pith. sign in
Pith Number

pith:4Q5XUAZ3

pith:2024:4Q5XUAZ334OCLVGJRTDYNWGGXY
not attested not anchored not stored refs resolved

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Chengyue Wu, Chong Ruan, Ping Luo, Wen Liu, Xiaokang Chen, Xingchao Liu, Xingkai Yu, Yiyang Ma, Zhenda Xie, Zhiyu Wu, Zizheng Pan

Decoupling the visual encoder into separate pathways lets a single transformer handle both multimodal understanding and generation without performance trade-offs.

arxiv:2410.13848 v1 · 2024-10-17 · cs.CV · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4Q5XUAZ334OCLVGJRTDYNWGGXY}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments show that Janus surpasses previous unified model and matches or exceeds the performance of task-specific models.

C2weakest assumption

That the conflict arising from differing information granularity in understanding versus generation is the main performance bottleneck and that decoupling the encoders will resolve it without introducing new integration problems in the shared transformer.

C3one line summary

Janus decouples visual encoding into task-specific pathways inside a single autoregressive transformer to unify multimodal understanding and generation while outperforming earlier unified models.

References

96 extracted · 96 resolved · 38 Pith anchors

[1] GPT-4 Technical Report 2023 · arXiv:2303.08774
[2] The claude 3 model family: Opus, sonnet, haiku 2024
[3] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 2023 · arXiv:2308.12966
[4] arXiv preprint arXiv:2306.16934 (2023) 2023
[5] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism 2024 · arXiv:2401.02954

Formal links

3 machine-checked theorem links

Cited by

42 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:49.991947Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

e43b7a033bdf1c25d4c98cc786d8c6be3f0a249cee13c37b6adeec1632aa6689

Aliases

arxiv: 2410.13848 · arxiv_version: 2410.13848v1 · doi: 10.48550/arxiv.2410.13848 · pith_short_12: 4Q5XUAZ334OC · pith_short_16: 4Q5XUAZ334OCLVGJ · pith_short_8: 4Q5XUAZ3
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4Q5XUAZ334OCLVGJRTDYNWGGXY \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e43b7a033bdf1c25d4c98cc786d8c6be3f0a249cee13c37b6adeec1632aa6689
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ccac9affd22a03abd24c36a6f1dbfd031a9ff1e4871892b274524fd558580680",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-10-17T17:58:37Z",
    "title_canon_sha256": "5fdd467fda2d0c8f5f6d4b978c5f95603eb588043dd172c2860ea2b4301e3512"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2410.13848",
    "kind": "arxiv",
    "version": 1
  }
}