Recognition: no theorem link
Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation
Pith reviewed 2026-05-15 01:48 UTC · model grok-4.3
The pith
Delta Forcing constrains unreliable teacher guidance in autoregressive video models using an adaptive trust region estimated from latent trajectory deltas, reducing drift while keeping reactivity to new events.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Delta Forcing estimates transition consistency from the latent delta between teacher and generator trajectories and uses it to balance teacher supervision with a monotonic continuity objective, suppressing unreliable teacher-induced shifts while preserving responsiveness to new events.
What carries the argument
Delta Forcing, which computes an adaptive trust region from the latent delta between teacher and generator trajectories to limit unreliable teacher supervision during generation.
If this is right
- Temporal coherence improves over long generation horizons after condition changes.
- Event reactivity remains intact because the trust region shrinks only when deltas signal inconsistency.
- The method integrates directly into distilled autoregressive generators without requiring new model architectures.
- Drift that arises from trajectory-agnostic teacher guidance is measurably reduced.
Where Pith is reading between the lines
- The same delta-based trust region could be tested on autoregressive models for audio or 3D scene generation where teacher signals also drift.
- If the continuity objective proves robust, it might shorten the streaming-long-tuning stage now needed for these models.
- Real-time simulators could adopt the approach to keep predictions stable across frequent user interventions.
- Measuring accumulated pixel or feature error on held-out videos with abrupt switches would directly test the claim.
Load-bearing premise
The latent delta between teacher and generator trajectories supplies a reliable measure of transition consistency that safely defines a trust region without creating fresh instabilities.
What would settle it
An experiment on sequences with sudden condition changes in which videos produced under Delta Forcing exhibit higher drift or slower event response than the identical baseline without the trust-region constraint.
Figures
read the original abstract
Interactive real-time autoregressive video generation is essential for applications such as content creation and world modeling, where visual content must adapt to dynamically evolving event conditions. A fundamental challenge lies in balancing reactivity and stability: models must respond promptly to new events while maintaining temporal coherence over long horizons. Existing approaches distill bidirectional models into autoregressive generators and further adapt them via streaming long tuning, yet often exhibit persistent drift after condition changes. We identify the cause as conditional bias, where the teacher may provide condition-aligned but trajectory-agnostic guidance, biasing generation toward locally valid yet globally inconsistent modes. Inspired by Trust Region Policy Optimization, we propose Delta Forcing, a simple yet effective framework that constrains unreliable teacher supervision within an adaptive trust region. Specifically, Delta Forcing estimates transition consistency from the latent delta between teacher and generator trajectories, and uses it to balance teacher supervision with a monotonic continuity objective. This suppress unreliable teacher-induced shifts while preserving responsiveness to new events. Extensive experiments demonstrate that Delta Forcing significantly improves consistency while maintaining event reactivity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Delta Forcing, a framework for interactive autoregressive video generation that adapts Trust Region Policy Optimization ideas to constrain teacher supervision within an adaptive trust region derived from the latent delta between teacher and generator trajectories. The method balances teacher guidance against a monotonic continuity objective to reduce conditional bias and drift while preserving reactivity to new events, with claims of significant consistency gains supported by extensive experiments.
Significance. If the empirical claims hold, Delta Forcing offers a lightweight, interpretable mechanism to stabilize long-horizon autoregressive video models under dynamic conditioning, which could benefit real-time applications such as world modeling and interactive content creation. The explicit use of latent-space deltas to modulate trust-region radius is a direct and potentially reusable idea, though its impact depends on whether the latent space reliably reflects transition consistency.
major comments (3)
- [Abstract, §4] Abstract and §4 (Experiments): The central claim that Delta Forcing 'significantly improves consistency while maintaining event reactivity' is asserted without any reported quantitative metrics, baselines, ablation tables, or statistical significance tests. This absence makes it impossible to evaluate the magnitude of improvement or to verify that the adaptive trust region actually outperforms standard distillation or streaming tuning.
- [§3.2] §3.2 (Delta Forcing formulation): The trust-region radius is defined directly from the latent delta ||z_teacher - z_gen|| under the assumption that this quantity is a monotonic proxy for transition consistency. No derivation or sensitivity analysis is provided showing that the mapping remains valid when divergence arises from mode collapse or teacher conditional bias rather than genuine transition error, which is the load-bearing assumption identified in the skeptic note.
- [§3.1, §4] §3.1 and §4: The monotonic continuity objective is introduced to counteract unreliable teacher steps, yet no ablation isolates its contribution from the trust-region weighting, nor is there a test confirming that the combined objective preserves event reactivity under distribution shift. Without these controls the reported gains cannot be attributed to the proposed mechanism.
minor comments (2)
- [§3] Notation for the latent delta and trust-region radius should be introduced with explicit symbols and units in §3 to avoid ambiguity when the same symbols appear in the continuity loss.
- [Abstract] The abstract mentions 'streaming long tuning' as a baseline but provides no citation or brief description of the exact procedure used for comparison.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our paper. We address each of the major comments below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract, §4] Abstract and §4 (Experiments): The central claim that Delta Forcing 'significantly improves consistency while maintaining event reactivity' is asserted without any reported quantitative metrics, baselines, ablation tables, or statistical significance tests. This absence makes it impossible to evaluate the magnitude of improvement or to verify that the adaptive trust region actually outperforms standard distillation or streaming tuning.
Authors: We thank the referee for highlighting this point. The experiments section does provide quantitative results comparing to baselines, but we acknowledge that the abstract is qualitative and that additional statistical tests would enhance rigor. We will update the abstract with key metrics and include significance tests in the revised §4. revision: yes
-
Referee: [§3.2] §3.2 (Delta Forcing formulation): The trust-region radius is defined directly from the latent delta ||z_teacher - z_gen|| under the assumption that this quantity is a monotonic proxy for transition consistency. No derivation or sensitivity analysis is provided showing that the mapping remains valid when divergence arises from mode collapse or teacher conditional bias rather than genuine transition error, which is the load-bearing assumption identified in the skeptic note.
Authors: The choice of latent delta as proxy is motivated by the idea that larger deviations signal potential inconsistency in the teacher's guidance. We will add a short derivation in §3.2 and perform sensitivity analysis in experiments to validate the assumption under various conditions including mode collapse. revision: yes
-
Referee: [§3.1, §4] §3.1 and §4: The monotonic continuity objective is introduced to counteract unreliable teacher steps, yet no ablation isolates its contribution from the trust-region weighting, nor is there a test confirming that the combined objective preserves event reactivity under distribution shift. Without these controls the reported gains cannot be attributed to the proposed mechanism.
Authors: We concur that isolating the effects is important for validating the mechanism. We will incorporate ablations in §4 separating the continuity objective and trust region components, along with tests for reactivity preservation under distribution shifts. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper proposes Delta Forcing as a new framework inspired by the external TRPO algorithm. It defines an adaptive trust region using the observable latent delta between teacher and generator trajectories to modulate supervision against a monotonic continuity term. This construction is presented as a design choice rather than a derived prediction; the claimed consistency gains are supported by experiments rather than reducing by construction to the input delta or any self-citation. No equations or load-bearing steps in the provided text equate the output improvement to a fitted parameter or prior self-referential result. The central premise remains an independent modeling decision whose validity is left to empirical validation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Latent delta between teacher and generator trajectories estimates transition consistency
- domain assumption Trust-region policy optimization principles transfer directly to constraining teacher supervision in autoregressive video models
Reference graph
Works this paper leans on
-
[1]
Wan: Open and Advanced Large-Scale Video Generative Models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
HunyuanVideo 1.5 Technical Report
Bing Wu, Chang Zou, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Jack Peng, Jianbing Wu, Jiangfeng Xiong, Jie Jiang, et al. Hunyuanvideo 1.5 technical report.arXiv preprint arXiv:2511.18870, 2025
work page internal anchor Pith review arXiv 2025
-
[3]
LTX-Video: Realtime Video Latent Diffusion
Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richard- son, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, et al. Ltx-video: Realtime video latent diffusion.arXiv preprint arXiv:2501.00103, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
Seedance 2.0: Advancing Video Generation for World Complexity
Team Seedance, De Chen, Liyang Chen, Xin Chen, Ying Chen, Zhuo Chen, Zhuowei Chen, Feng Cheng, Tianheng Cheng, Yufeng Cheng, et al. Seedance 2.0: Advancing video generation for world complexity.arXiv preprint arXiv:2604.14148, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[5]
Kling Team, Jialu Chen, Yuanzheng Ci, Xiangyu Du, Zipeng Feng, Kun Gai, Sainan Guo, Feng Han, Jingbin He, Kang He, et al. Kling-omni technical report.arXiv preprint arXiv:2512.16776, 2025
-
[6]
Video generation models as world simulators
Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Leo Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, et al. Video generation models as world simulators. OpenAI Blog, 1(8):1, 2024
work page 2024
-
[7]
Open-sora plan: Open-source large video generation model.arXiv preprint arXiv:2412.00131, 2024
Bin Lin, Yunyang Ge, Xinhua Cheng, Zongjian Li, Bin Zhu, Shaodong Wang, Xianyi He, Yang Ye, Shenghai Yuan, Liuhan Chen, et al. Open-sora plan: Open-source large video generation model.arXiv preprint arXiv:2412.00131, 2024
-
[8]
Fan Bao, Chendong Xiang, Gang Yue, Guande He, Hongzhou Zhu, Kaiwen Zheng, Min Zhao, Shilong Liu, Yaole Wang, and Jun Zhu. Vidu: a highly consistent, dynamic and skilled text-to-video generator with diffusion models.arXiv preprint arXiv:2405.04233, 2024
-
[9]
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion, December 2024
Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, and Vincent Sitzmann. Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion, December 2024
work page 2024
-
[10]
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion, November 2025
Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion, November 2025
work page 2025
-
[11]
Freeman, Fredo Durand, Eli Shechtman, and Xun Huang
Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Fredo Durand, Eli Shechtman, and Xun Huang. From Slow Bidirectional to Fast Autoregressive Video Diffusion Models, September 2025
work page 2025
-
[12]
Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, and Jun Zhu. Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation, February 2026. 10
work page 2026
-
[13]
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T. Freeman, and Taesung Park. One-step Diffusion with Distribution Matching Distillation, October 2024
work page 2024
-
[14]
Improved distribution matching distillation for fast image synthesis
Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T Freeman. Improved distribution matching distillation for fast image synthesis. Advances in neural information processing systems, 37:47455–47487, 2024
work page 2024
-
[15]
LongLive: Real-time Interactive Long Video Generation, October 2025
Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, and Yukang Chen. LongLive: Real-time Interactive Long Video Generation, October 2025
work page 2025
-
[16]
MemFlow: Flowing adaptive memory for consistent and efficient long video narratives, December 2025
Sihui Ji, Xi Chen, Shuai Yang, Xin Tao, Pengfei Wan, and Hengshuang Zhao. MemFlow: Flowing adaptive memory for consistent and efficient long video narratives, December 2025
work page 2025
-
[17]
arXiv preprint arXiv:2510.02283 (2025)
Justin Cui, Jie Wu, Ming Li, Tao Yang, Xiaojie Li, Rui Wang, Andrew Bai, Yuanhao Ban, and Cho-Jui Hsieh. Self-forcing++: Towards minute-scale high-quality video generation.arXiv preprint arXiv:2510.02283, 2025
-
[18]
Trust region policy optimization
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. InInternational conference on machine learning, pages 1889–1897. PMLR, 2015
work page 2015
-
[19]
Live: Long-horizon interactive video world modeling.arXiv preprint arXiv:2602.03747, 2026
Junchao Huang, Ziyang Ye, Xinting Hu, Tianyu He, Guiyu Zhang, Shaoshuai Shi, Jiang Bian, and Li Jiang. Live: Long-horizon interactive video world modeling.arXiv preprint arXiv:2602.03747, 2026
-
[20]
arXiv preprint arXiv:2602.06028 (2026)
Shuo Chen, Cong Wei, Sun Sun, Ping Nie, Kai Zhou, Ge Zhang, Ming-Hsuan Yang, and Wenhu Chen. Context forcing: Consistent autoregressive video generation with long context.arXiv preprint arXiv:2602.06028, 2026
-
[21]
Rolling forcing: Autoregressive long video diffusion in real time, September 2025
Kunhao Liu, Wenbo Hu, Jiale Xu, Ying Shan, and Shijian Lu. Rolling forcing: Autoregressive long video diffusion in real time, September 2025
work page 2025
-
[22]
Yunhong Lu, Yanhong Zeng, Haobo Li, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jiapeng Zhu, Hengyuan Cao, Zhipeng Zhang, Xing Zhu, Yujun Shen, and Min Zhang. Reward Forc- ing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation, December 2025
work page 2025
-
[23]
Yang Yang, Tianyi Zhang, Wei Huang, Jinwei Chen, Boxi Wu, Xiaofei He, Deng Cai, Bo Li, and Peng-Tao Jiang. Anchor forcing: Anchor memory and tri-region rope for interactive streaming video diffusion.arXiv preprint arXiv:2603.13405, 2026
-
[24]
Jintao Chen, Chengyu Bai, Xinda Xue, Mu Xu, et al. Grounded forcing: Bridging time- independent semantics and proximal dynamics in autoregressive video synthesis.arXiv preprint arXiv:2604.06939, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
Streaming autoregressive video generation via diagonal distillation, 2026
Jinxiu Liu, Xuanming Liu, Kangfu Mei, Yandong Wen, Ming-HsuanYang, and Weiyang Liu. Streaming autoregressive video generation via diagonal distillation, 2026
work page 2026
-
[26]
Kai Zou, Dian Zheng, Hongbo Liu, Tiankai Hang, Bin Liu, and Nenghai Yu. Hiar: Efficient au- toregressive long video generation via hierarchical denoising.arXiv preprint arXiv:2603.08703, 2026
-
[27]
SkyReels-V2: Infinite-length Film Generative Model
Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, Hao Zhang, Sheng Chen, Zheng Chen, Chengcheng Ma, et al. Skyreels-v2: infinite-length film generative model (2025).URL https://arxiv. org/abs/2504.13074
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[28]
MAGI-1: Autoregressive Video Generation at Scale
Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, WQ Zhang, Weifeng Luo, et al. Magi-1: Autoregressive video generation at scale.arXiv preprint arXiv:2505.13211, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Yawen Luo, Xiaoyu Shi, Junhao Zhuang, Yutian Chen, Quande Liu, Xintao Wang, Pengfei Wan, and Tianfan Xue. Shotstream: Streaming multi-shot video generation for interactive storytelling. arXiv preprint arXiv:2603.25746, 2026. 11
-
[30]
Echoshot: Multi-shot portrait video generation
Jiahao Wang, Hualian Sheng, Sijia Cai, Weizhan Zhang, Caixia Yan, Yachuang Feng, Bing Deng, and Jieping Ye. Echoshot: Multi-shot portrait video generation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[31]
Moviebench: A hierarchical movie level dataset for long video generation
Xiaoxue Wu, Bingjie Gao, Yu Qiao, Yaohui Wang, and Xinyuan Chen. Cinetrans: Learning to generate videos with cinematic transitions via masked diffusion models.arXiv preprint arXiv:2508.11484, 2025
-
[32]
Maskˆ 2dit: Dual mask-based diffusion transformer for multi-scene long video generation
Tianhao Qi, Jianlong Yuan, Wanquan Feng, Shancheng Fang, Jiawei Liu, SiYu Zhou, Qian He, Hongtao Xie, and Yongdong Zhang. Maskˆ 2dit: Dual mask-based diffusion transformer for multi-scene long video generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18837–18846, 2025
work page 2025
-
[33]
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation
Gordon Chen, Ziqi Huang, and Ziwei Liu. Prompt relay: Inference-time temporal control for multi-event video generation.arXiv preprint arXiv:2604.10030, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[34]
Mind the time: Temporally-controlled multi-event video generation
Ziyi Wu, Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Yuwei Fang, Varnith Chordia, Igor Gilitschenski, and Sergey Tulyakov. Mind the time: Temporally-controlled multi-event video generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23989–24000, 2025
work page 2025
-
[35]
Sharath Girish, Viacheslav Ivanov, Tsai-Shien Chen, Hao Chen, Aliaksandr Siarohin, and Sergey Tulyakov. Alchemint: Fine-grained temporal control for multi-reference consistent video generation.arXiv preprint arXiv:2512.10943, 2025
-
[36]
Qianxun Xu, Chenxi Song, Yujun Cai, and Chi Zhang. Switchcraft: Training-free multi-event video generation with attention controls.arXiv preprint arXiv:2602.23956, 2026
-
[37]
Mevg: Multi-event video generation with text-to-video models
Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, and Sangpil Kim. Mevg: Multi-event video generation with text-to-video models. InEuropean Conference on computer vision, pages 401–418. Springer, 2024
work page 2024
-
[38]
TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
Hongyu Zhang, Yufan Deng, Zilin Pan, Peng-Tao Jiang, Bo Li, Qibin Hou, Zhiyang Dou, Zhen Dong, and Daquan Zhou. Ts-attn: Temporal-wise separable attention for multi-event video generation.arXiv preprint arXiv:2604.19473, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[39]
Minghong Cai, Xiaodong Cun, Xiaoyu Li, Wenze Liu, Zhaoyang Zhang, Yong Zhang, Ying Shan, and Xiangyu Yue. Ditctrl: Exploring attention control in multi-modal diffusion trans- former for tuning-free multi-prompt longer video generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7763–7772, 2025
work page 2025
-
[40]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[41]
Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...
work page 2025
-
[42]
VBench: Comprehensive benchmark suite for video generative models
Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. VBench: Comprehensive benchmark suite for video generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
work page 2024
-
[43]
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Dian Zheng, Ziqi Huang, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang, Yuanhan Zhang, Jingwen He, Wei-Shi Zheng, Yu Qiao, and Ziwei Liu. VBench-2.0: Advancing video generation benchmark suite for intrinsic faithfulness.arXiv preprint arXiv:2503.21755, 2025. 12
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, and Ziwei Liu. VBench++: Comprehensive and versatile benchmark suite for video generative models.IEEE Transactions on Pattern Analysis and Machine Intelli...
work page 2025
-
[45]
Long-clip: Unlocking the long-text capability of clip.arXiv preprint arXiv:2403.15378, 2024
Beichen Zhang, Pan Zhang, Xiaoyi Dong, Yuhang Zang, and Jiaqi Wang. Long-clip: Unlocking the long-text capability of clip.arXiv preprint arXiv:2403.15378, 2024
-
[46]
Improving Video Generation with Human Feedback
Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Menghan Xia, Xintao Wang, et al. Improving video generation with human feedback.arXiv preprint arXiv:2501.13918, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021
work page 2021
-
[48]
Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, et al. Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649, 2025
-
[49]
Lichen Bai, Zikai Zhou, Shitong Shao, Wenliang Zhong, Shuo Yang, Shuo Chen, Bojun Chen, and Zeke Xie. Optimizing few-step generation with adaptive matching distillation.arXiv preprint arXiv:2602.07345, 2026
-
[50]
Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne.Journal of Machine Learning Research, 9(86):2579–2605, 2008
work page 2008
-
[51]
L. McInnes, J. Healy, and J. Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.ArXiv e-prints, February 2018. 13 A Motivation Study via Latent Trajectory Visualization To supplement our motivation analysis, we provide a latent-space diagnostic that reveals how existing interactive streaming video generation methods beha...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.