{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:3WCZOPNLJESLCIYOOTKNPG4SY5","short_pith_number":"pith:3WCZOPNL","schema_version":"1.0","canonical_sha256":"dd85973dab4924b1230e74d4d79b92c766ae3cc8f3c755bb50c38597b8272a8d","source":{"kind":"arxiv","id":"2605.15196","version":1},"attestation_state":"computed","paper":{"title":"RefDecoder: Enhancing Visual Generation with Conditional Video Decoding","license":"http://creativecommons.org/licenses/by-sa/4.0/","headline":"RefDecoder adds reference-image conditioning to video VAE decoders through attention, yielding up to 2.1 dB PSNR gains and better consistency on I2V, editing, and style-transfer tasks.","cross_cats":["cs.LG"],"primary_cat":"cs.CV","authors_text":"Bohan Fang, Ranjay Krishna, Xiang Fan, Yuheng Wang, Zhongzheng Ren","submitted_at":"2026-05-14T17:59:52Z","abstract_excerpt":"Video generation powers a vast array of downstream applications. However, while the de facto standard, i.e., latent diffusion models, typically employ heavily conditioned denoising networks, their decoders often remain unconditional. We observe that this architectural asymmetry leads to significant loss of detail and inconsistency relative to the input image. To address this, we argue that the decoder requires equal conditioning to preserve structural integrity. We introduce RefDecoder, a reference-conditioned video VAE decoder by injecting high-fidelity reference image signal directly into th"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2605.15196","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by-sa/4.0/","primary_cat":"cs.CV","submitted_at":"2026-05-14T17:59:52Z","cross_cats_sorted":["cs.LG"],"title_canon_sha256":"877ed8b8f595c87067d2944d4d25e4feb3db485a9683b647abc3b1396daad233","abstract_canon_sha256":"df03884989f4ed1d6106661a2fcf979c3db31a6fc5086a6be40cf1ae8869b776"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.2","canonical_sha256":"dd85973dab4924b1230e74d4d79b92c766ae3cc8f3c755bb50c38597b8272a8d","last_reissued_at":"2026-05-17T21:57:18.416489Z","signature_status":"unsigned_v0","first_computed_at":"2026-05-17T21:40:25.018212Z"},"graph_snapshot":{"paper":{"title":"RefDecoder: Enhancing Visual Generation with Conditional Video Decoding","license":"http://creativecommons.org/licenses/by-sa/4.0/","headline":"RefDecoder adds reference-image conditioning to video VAE decoders through attention, yielding up to 2.1 dB PSNR gains and better consistency on I2V, editing, and style-transfer tasks.","cross_cats":["cs.LG"],"primary_cat":"cs.CV","authors_text":"Bohan Fang, Ranjay Krishna, Xiang Fan, Yuheng Wang, Zhongzheng Ren","submitted_at":"2026-05-14T17:59:52Z","abstract_excerpt":"Video generation powers a vast array of downstream applications. However, while the de facto standard, i.e., latent diffusion models, typically employ heavily conditioned denoising networks, their decoders often remain unconditional. We observe that this architectural asymmetry leads to significant loss of detail and inconsistency relative to the input image. To address this, we argue that the decoder requires equal conditioning to preserve structural integrity. We introduce RefDecoder, a reference-conditioned video VAE decoder by injecting high-fidelity reference image signal directly into th"},"claims":{"count":3,"items":[{"kind":"strongest_claim","text":"We introduce RefDecoder, a reference-conditioned video VAE decoder by injecting high-fidelity reference image signal directly into the decoding process via reference attention... achieving up to +2.1dB PSNR over the unconditional baselines on the Inter4K, WebVid, and Large Motion reconstruction benchmarks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That equal conditioning of the decoder via reference attention is sufficient to preserve structural integrity without introducing new artifacts or requiring any fine-tuning of the rest of the pipeline.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"RefDecoder adds reference-image conditioning to video VAE decoders through attention, yielding up to 2.1 dB PSNR gains and better consistency on I2V, editing, and style-transfer tasks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"}],"snapshot_sha256":"1e6d0f547894b1293e112af18406e9e0dfd045446e49890a615bddb829953c42"},"source":{"id":"2605.15196","kind":"arxiv","version":1},"verdict":{"id":"4c7596fd-74e6-49d4-9265-0787753f1d19","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T03:17:21.949834Z","strongest_claim":"We introduce RefDecoder, a reference-conditioned video VAE decoder by injecting high-fidelity reference image signal directly into the decoding process via reference attention... achieving up to +2.1dB PSNR over the unconditional baselines on the Inter4K, WebVid, and Large Motion reconstruction benchmarks.","one_line_summary":"RefDecoder adds reference-image conditioning to video VAE decoders through attention, yielding up to 2.1 dB PSNR gains and better consistency on I2V, editing, and style-transfer tasks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That equal conditioning of the decoder via reference attention is sufficient to preserve structural integrity without introducing new artifacts or requiring any fine-tuning of the rest of the pipeline.","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":1,"snapshot_sha256":"6380e121a5303fa3071acc26193bf3050185e00176bb4bad61cce154b93d7be0"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.15196","created_at":"2026-05-17T21:18:32.720559+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.15196v1","created_at":"2026-05-17T21:18:32.720559+00:00"},{"alias_kind":"pith_short_12","alias_value":"3WCZOPNLJESL","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"3WCZOPNLJESLCIYO","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"3WCZOPNL","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":1,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/3WCZOPNLJESLCIYOOTKNPG4SY5","json":"https://pith.science/pith/3WCZOPNLJESLCIYOOTKNPG4SY5.json","graph_json":"https://pith.science/api/pith-number/3WCZOPNLJESLCIYOOTKNPG4SY5/graph.json","events_json":"https://pith.science/api/pith-number/3WCZOPNLJESLCIYOOTKNPG4SY5/events.json","paper":"https://pith.science/paper/3WCZOPNL"},"agent_actions":{"view_html":"https://pith.science/pith/3WCZOPNLJESLCIYOOTKNPG4SY5","download_json":"https://pith.science/pith/3WCZOPNLJESLCIYOOTKNPG4SY5.json","view_paper":"https://pith.science/paper/3WCZOPNL","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.15196&json=true","fetch_graph":"https://pith.science/api/pith-number/3WCZOPNLJESLCIYOOTKNPG4SY5/graph.json","fetch_events":"https://pith.science/api/pith-number/3WCZOPNLJESLCIYOOTKNPG4SY5/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/3WCZOPNLJESLCIYOOTKNPG4SY5/action/timestamp_anchor","attest_storage":"https://pith.science/pith/3WCZOPNLJESLCIYOOTKNPG4SY5/action/storage_attestation","attest_author":"https://pith.science/pith/3WCZOPNLJESLCIYOOTKNPG4SY5/action/author_attestation","sign_citation":"https://pith.science/pith/3WCZOPNLJESLCIYOOTKNPG4SY5/action/citation_signature","submit_replication":"https://pith.science/pith/3WCZOPNLJESLCIYOOTKNPG4SY5/action/replication_record"}},"created_at":"2026-05-17T21:18:32.720559+00:00","updated_at":"2026-05-17T21:57:18.416567+00:00"}