{"paper":{"title":"Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"Delta Forcing constrains unreliable teacher guidance in autoregressive video models using an adaptive trust region estimated from latent trajectory deltas, reducing drift while keeping reactivity to new events.","cross_cats":["cs.GR","cs.MM"],"primary_cat":"cs.CV","authors_text":"Dongman Lee, Qing Yin, Tianhao Chen, Xiangbo Gao, Xinghao Chen, Yuheng Wu, Zhengzhong Tu","submitted_at":"2026-05-14T05:06:57Z","abstract_excerpt":"Interactive real-time autoregressive video generation is essential for applications such as content creation and world modeling, where visual content must adapt to dynamically evolving event conditions. A fundamental challenge lies in balancing reactivity and stability: models must respond promptly to new events while maintaining temporal coherence over long horizons. Existing approaches distill bidirectional models into autoregressive generators and further adapt them via streaming long tuning, yet often exhibit persistent drift after condition changes. We identify the cause as conditional bi"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Delta Forcing significantly improves consistency while maintaining event reactivity by constraining unreliable teacher supervision within an adaptive trust region estimated from latent deltas.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the latent delta between teacher and generator trajectories provides a reliable estimate of transition consistency that can be used to define a safe trust region without introducing new instabilities.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Delta Forcing uses latent trajectory deltas to adaptively limit unreliable teacher guidance while enforcing monotonic continuity, improving temporal consistency in interactive autoregressive video generation.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Delta Forcing constrains unreliable teacher guidance in autoregressive video models using an adaptive trust region estimated from latent trajectory deltas, reducing drift while keeping reactivity to new events.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"bd99dfaa1b7e9019a4eb8154556782422a720b804d9b4e20c5bacf333301de20"},"source":{"id":"2605.14382","kind":"arxiv","version":1},"verdict":{"id":"5fc18b07-c93e-4e41-95b9-daa29091169a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T01:48:47.100871Z","strongest_claim":"Delta Forcing significantly improves consistency while maintaining event reactivity by constraining unreliable teacher supervision within an adaptive trust region estimated from latent deltas.","one_line_summary":"Delta Forcing uses latent trajectory deltas to adaptively limit unreliable teacher guidance while enforcing monotonic continuity, improving temporal consistency in interactive autoregressive video generation.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the latent delta between teacher and generator trajectories provides a reliable estimate of transition consistency that can be used to define a safe trust region without introducing new instabilities.","pith_extraction_headline":"Delta Forcing constrains unreliable teacher guidance in autoregressive video models using an adaptive trust region estimated from latent trajectory deltas, reducing drift while keeping reactivity to new events."},"references":{"count":51,"sample":[{"doi":"","year":2025,"title":"Wan: Open and Advanced Large-Scale Video Generative Models","work_id":"ad3ebc3b-4224-46c9-b61d-bcf135da0a7c","ref_index":1,"cited_arxiv_id":"2503.20314","is_internal_anchor":true},{"doi":"","year":2025,"title":"HunyuanVideo 1.5 Technical Report","work_id":"ed898a38-b053-407c-bbce-41561510c1de","ref_index":2,"cited_arxiv_id":"2511.18870","is_internal_anchor":true},{"doi":"","year":2024,"title":"LTX-Video: Realtime Video Latent Diffusion","work_id":"cee5c521-3ce9-466e-a035-1e42f89254f4","ref_index":3,"cited_arxiv_id":"2501.00103","is_internal_anchor":true},{"doi":"","year":2026,"title":"Seedance 2.0: Advancing Video Generation for World Complexity","work_id":"ceac4ea4-1ca6-4fe9-8324-880c07aec27d","ref_index":4,"cited_arxiv_id":"2604.14148","is_internal_anchor":true},{"doi":"","year":2025,"title":"Kling-Omni Technical Report","work_id":"52d502bd-9d8e-4944-9bf2-cfd097cfdb4e","ref_index":5,"cited_arxiv_id":"2512.16776","is_internal_anchor":true}],"resolved_work":51,"snapshot_sha256":"be5001474683dad1e99d07db0e631320ac0a2acd0cb165072046716ce76a0a91","internal_anchors":14},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}