{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:4KFEMSFZXVKKVLX2SZ4RNRZ6WH","short_pith_number":"pith:4KFEMSFZ","schema_version":"1.0","canonical_sha256":"e28a4648b9bd54aaaefa967916c73eb1c3405e39f598951dd15a098508ea06d3","source":{"kind":"arxiv","id":"2605.13013","version":1},"attestation_state":"computed","paper":{"title":"JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"JEDI trains an end-to-end latent diffusion world model by learning predictive latents directly from the diffusion denoising loss inside a JEPA framework.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Dianbo Liu, Haozhe Ma, Jing Yu Lim, Rushi Shah, Samson Yu, Tze-Yun Leong, Zarif Ikram","submitted_at":"2026-05-13T05:07:32Z","abstract_excerpt":"Diffusion world models have recently become competitive for online model-based reinforcement learning, but current approaches expose a tension: pixel diffusion is effective but computationally expensive while the latest latent diffusion approach improves efficiency yet performs subpar. The latter also relies on separately trained latents rather than the end-to-end world-model objectives that have driven much of modern MBRL progress. In particular, JEPA-style predictive representation learning has emerged as an especially promising direction for world modeling and MBRL. Concurrently, diffusion-"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2605.13013","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-13T05:07:32Z","cross_cats_sorted":[],"title_canon_sha256":"c814de03040f53e2e8e811ce7caa1e18fdc876c632131e48255a62973a710941","abstract_canon_sha256":"290ca9412f9ef4ab54eca3f7ef1b9d819478dfe49e44f04e95302711915c4974"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T03:09:00.222946Z","signature_b64":"3tz85DwPWy8ht1GH1dUjB2LRWE+Z9emkGDCrc4Va42Zs9kDxbqgzfeBLyzkdDxnr3+4ELrRsAL2Kkil/GALRDA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"e28a4648b9bd54aaaefa967916c73eb1c3405e39f598951dd15a098508ea06d3","last_reissued_at":"2026-05-18T03:09:00.222191Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T03:09:00.222191Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"JEDI trains an end-to-end latent diffusion world model by learning predictive latents directly from the diffusion denoising loss inside a JEPA framework.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Dianbo Liu, Haozhe Ma, Jing Yu Lim, Rushi Shah, Samson Yu, Tze-Yun Leong, Zarif Ikram","submitted_at":"2026-05-13T05:07:32Z","abstract_excerpt":"Diffusion world models have recently become competitive for online model-based reinforcement learning, but current approaches expose a tension: pixel diffusion is effective but computationally expensive while the latest latent diffusion approach improves efficiency yet performs subpar. The latter also relies on separately trained latents rather than the end-to-end world-model objectives that have driven much of modern MBRL progress. In particular, JEPA-style predictive representation learning has emerged as an especially promising direction for world modeling and MBRL. Concurrently, diffusion-"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"JEDI is the first online end-to-end latent diffusion world model. It learns its latent space directly from the diffusion denoising loss with a JEPA framework... Empirically, JEDI is competitive on Atari100k and outperforms the baseline with separately trained latents... JEDI uses 43% less VRAM, over 3× faster world-model sampling, and 2.5× faster training.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That training latents end-to-end from the diffusion denoising loss inside the JEPA framework avoids the predictive information bottleneck of conventional JEPA objectives and yields representations that are both predictive and efficient for online MBRL.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"JEDI trains an end-to-end latent diffusion world model by learning predictive latents directly from the diffusion denoising loss inside a JEPA framework.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"7d1eae1b6c2355c68ef4ff630106bb170121a9efd20f0ff41511c5348a33c49a"},"source":{"id":"2605.13013","kind":"arxiv","version":1},"verdict":{"id":"49b55a24-5e39-4686-aa49-6934e6fa04c0","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:32:52.755384Z","strongest_claim":"JEDI is the first online end-to-end latent diffusion world model. It learns its latent space directly from the diffusion denoising loss with a JEPA framework... Empirically, JEDI is competitive on Atari100k and outperforms the baseline with separately trained latents... JEDI uses 43% less VRAM, over 3× faster world-model sampling, and 2.5× faster training.","one_line_summary":"JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That training latents end-to-end from the diffusion denoising loss inside the JEPA framework avoids the predictive information bottleneck of conventional JEPA objectives and yields representations that are both predictive and efficient for online MBRL.","pith_extraction_headline":"JEDI trains an end-to-end latent diffusion world model by learning predictive latents directly from the diffusion denoising loss inside a JEPA framework."},"references":{"count":92,"sample":[{"doi":"","year":1991,"title":"Dyna, an integrated architecture for learning, planning, and reacting.ACM Sigart Bulletin, 2(4):160–163","work_id":"2a06060a-dd49-459d-85a1-a9fa67d39225","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"World Models","work_id":"07227eee-8445-4c98-bce4-c6a6fd5ed907","ref_index":2,"cited_arxiv_id":"1803.10122","is_internal_anchor":true},{"doi":"","year":1912,"title":"Dream to Control: Learning Behaviors by Latent Imagination","work_id":"5103f4be-344a-4139-8504-eaa59f5bac9d","ref_index":3,"cited_arxiv_id":"1912.01603","is_internal_anchor":true},{"doi":"","year":2024,"title":"Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models","work_id":"49aeafea-5763-4b8e-aff4-8fa617aacb1f","ref_index":4,"cited_arxiv_id":"2402.17177","is_internal_anchor":true},{"doi":"","year":2024,"title":"Genie 2: A large-scale foundation world model.URL: https://deepmind","work_id":"fcab14e2-eeb9-4afc-bc33-92be8d7b91f2","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":92,"snapshot_sha256":"34883f1dff7f160bc777f569bab27da253c9352eac4f7f8e9cfbcf332bf7af1f","internal_anchors":17},"formal_canon":{"evidence_count":2,"snapshot_sha256":"2cf56e1b7768a37a95cb0d8642700f5f94d4c08b7e9428ae67878f4cda7de593"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.13013","created_at":"2026-05-18T03:09:00.222339+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.13013v1","created_at":"2026-05-18T03:09:00.222339+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.13013","created_at":"2026-05-18T03:09:00.222339+00:00"},{"alias_kind":"pith_short_12","alias_value":"4KFEMSFZXVKK","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"4KFEMSFZXVKKVLX2","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"4KFEMSFZ","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/4KFEMSFZXVKKVLX2SZ4RNRZ6WH","json":"https://pith.science/pith/4KFEMSFZXVKKVLX2SZ4RNRZ6WH.json","graph_json":"https://pith.science/api/pith-number/4KFEMSFZXVKKVLX2SZ4RNRZ6WH/graph.json","events_json":"https://pith.science/api/pith-number/4KFEMSFZXVKKVLX2SZ4RNRZ6WH/events.json","paper":"https://pith.science/paper/4KFEMSFZ"},"agent_actions":{"view_html":"https://pith.science/pith/4KFEMSFZXVKKVLX2SZ4RNRZ6WH","download_json":"https://pith.science/pith/4KFEMSFZXVKKVLX2SZ4RNRZ6WH.json","view_paper":"https://pith.science/paper/4KFEMSFZ","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.13013&json=true","fetch_graph":"https://pith.science/api/pith-number/4KFEMSFZXVKKVLX2SZ4RNRZ6WH/graph.json","fetch_events":"https://pith.science/api/pith-number/4KFEMSFZXVKKVLX2SZ4RNRZ6WH/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/4KFEMSFZXVKKVLX2SZ4RNRZ6WH/action/timestamp_anchor","attest_storage":"https://pith.science/pith/4KFEMSFZXVKKVLX2SZ4RNRZ6WH/action/storage_attestation","attest_author":"https://pith.science/pith/4KFEMSFZXVKKVLX2SZ4RNRZ6WH/action/author_attestation","sign_citation":"https://pith.science/pith/4KFEMSFZXVKKVLX2SZ4RNRZ6WH/action/citation_signature","submit_replication":"https://pith.science/pith/4KFEMSFZXVKKVLX2SZ4RNRZ6WH/action/replication_record"}},"created_at":"2026-05-18T03:09:00.222339+00:00","updated_at":"2026-05-18T03:09:00.222339+00:00"}