Recognition: 2 theorem links
· Lean TheoremLow-Bitrate Video Compression through Semantic-Conditioned Diffusion
Pith reviewed 2026-05-17 03:30 UTC · model grok-4.3
The pith
DiSCo transmits only semantic text, degraded frames, and motion cues then uses a conditional diffusion model to reconstruct high-quality video at low bitrates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By sending only a textual semantic description, a spatiotemporally degraded video, and optional sketches or poses, a conditional video diffusion model can synthesize temporally coherent high-quality output that outperforms baseline semantic and traditional codecs by 2-10X on perceptual metrics at low bitrates.
What carries the argument
The conditional video diffusion model that reconstructs the video from the three compact multimodal inputs of text semantics, degraded appearance, and motion cues.
If this is right
- Usable video can be delivered at bitrates where pixel-based codecs produce only artifacts.
- Compression can prioritize semantic and motion cues over exact pixel fidelity.
- Multimodal token interleaving and temporal forward filling become practical tools for maintaining coherence.
- The same compact representation supports both compression and downstream generative editing tasks.
Where Pith is reading between the lines
- The method could extend to live streaming if the diffusion model is made fast enough for real-time inference.
- Similar multimodal conditioning might apply to audio or 3D scene compression where generative priors are available.
- Evaluation protocols will need new metrics that detect semantic hallucinations in addition to traditional distortion measures.
Load-bearing premise
The diffusion model must produce temporally coherent video without hallucinations or artifacts that lower perceived quality when given only the compact multimodal inputs.
What would settle it
A controlled perceptual study in which viewers rate DiSCo reconstructions against traditional codec outputs at the same low bitrate and flag any temporal inconsistencies or invented content.
Figures
read the original abstract
Traditional video codecs optimized for pixel fidelity collapse at ultra-low bitrates and produce severe artifacts. This failure arises from a fundamental misalignment between pixel accuracy and human perception. We propose a semantic video compression framework named DiSCo that transmits only the most meaningful information while relying on generative priors for detail synthesis. The source video is decomposed into three compact modalities: a textual description, a spatiotemporally degraded video, and optional sketches or poses that respectively capture semantic, appearance, and motion cues. A conditional video diffusion model then reconstructs high-quality, temporally coherent videos from these compact representations. Temporal forward filling, token interleaving, and modality-specific codecs are proposed to improve multimodal generation and modality compactness. Experiments show that our method outperforms baseline semantic and traditional codecs by 2-10X on perceptual metrics at low bitrates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DiSCo, a semantic video compression framework that decomposes input video into three compact modalities (textual description, spatiotemporally degraded video, and optional sketches/poses) to capture semantic, appearance, and motion cues. These are transmitted at low bitrates and used to condition a video diffusion model for reconstructing high-quality, temporally coherent output. The authors introduce temporal forward filling and token interleaving to improve multimodal generation, along with modality-specific codecs. Experiments claim 2-10X gains on perceptual metrics versus baseline semantic and traditional codecs at low bitrates.
Significance. If the reported gains hold under rigorous evaluation, the work offers a meaningful shift from pixel-fidelity optimization to semantic-plus-generative reconstruction for ultra-low-bitrate video. The multimodal conditioning strategy and the specific generation techniques (temporal forward filling, token interleaving) are concrete contributions that could inform future perceptual codecs. The empirical comparisons and ablations provide initial support for the central claim.
minor comments (3)
- Section 4.1 and Table 2: the exact perceptual metrics (e.g., LPIPS, DISTS, or user-study scores), bitrate operating points, and test sequences should be stated explicitly in the caption or main text so readers can reproduce the 2-10X claim without consulting supplementary material.
- Figure 4: the visual comparison panels would benefit from zoomed insets or difference maps to highlight artifact reduction at the lowest bitrates; current resolution makes it hard to verify the claimed perceptual superiority.
- Section 3.2: the token-interleaving procedure is described in prose; a small pseudocode block or diagram would clarify the ordering of text, video, and sketch tokens across diffusion steps.
Simulated Author's Rebuttal
We thank the referee for the positive summary of DiSCo, the recognition of its potential impact on perceptual video compression, and the recommendation for minor revision. We are pleased that the multimodal conditioning approach and techniques such as temporal forward filling and token interleaving are viewed as concrete contributions.
Circularity Check
No significant circularity
full rationale
The manuscript is an empirical proposal for semantic video compression via conditional diffusion. It defines a multimodal input decomposition (text + degraded video + optional sketches/poses) and proposes engineering components (temporal forward filling, token interleaving, modality-specific codecs) whose value is demonstrated through quantitative comparisons and ablations against external baselines. No equations, parameter fittings, or derivations appear that reduce by construction to the paper's own inputs or prior self-citations. The central performance claims rest on externally measurable perceptual metrics rather than self-referential normalization or uniqueness theorems, rendering the work self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pretrained conditional video diffusion models can generate temporally coherent output from compact semantic, appearance, and motion inputs.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a semantic video compression framework named DiSCo that transmits only the most meaningful information while relying on generative priors for detail synthesis. ... token interleaving mechanism ... temporal forward-filling scheme
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Experiments show that our method outperforms baseline semantic and traditional codecs by 2-10X on perceptual metrics at low bitrates.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
NeuralLVC: Neural Lossless Video Compression via Masked Diffusion with Temporal Conditioning
NeuralLVC achieves better lossless compression than H.264 and H.265 on video sequences by combining masked diffusion with temporal conditioning on frame differences.
Reference graph
Works this paper leans on
-
[1]
Variational image compression with a scale hyperprior, 2018
Johannes Ball ´e, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational image compression with a scale hyperprior, 2018. 1, 2
work page 2018
-
[2]
Calculation of average PSNR differences between RD-curves
Gisle Bjontegaard. Calculation of average PSNR differences between RD-curves. Technical Report VCEG-M33, ITU-T SG16/Q6, Austin, TX, USA, April 2001. 5
work page 2001
-
[3]
Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J. Sullivan, and Jens-Rainer Ohm. Overview of the versatile video coding (vvc) standard and its applica- tions.IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021. 1, 2, 5
work page 2021
-
[4]
Kevin Cai, Chonghua Liu, and David M. Chan. Anim-400k: A large-scale dataset for automated end-to-end dubbing of video, 2024. 5
work page 2024
-
[5]
A computational approach to edge detection
John Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, PAMI-8(6):679–698, 1986. 3
work page 1986
-
[6]
Learning to generate line drawings that convey geometry and semantics
Caroline Chan, Fr ´edo Durand, and Phillip Isola. Learning to generate line drawings that convey geometry and semantics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7915–7925, June 2022. 2, 3
work page 2022
-
[7]
Time-adaptive video frame interpolation based on residual diffusion, 2025
Victor Fonte Chavez, Claudia Esteves, and Jean-Bernard Hayet. Time-adaptive video frame interpolation based on residual diffusion, 2025. 2
work page 2025
-
[8]
Nerv: Neural representations for videos
Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser Nam Lim, and Abhinav Shrivastava. Nerv: Neural representations for videos. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, volume 34, pages 21557– 21568. Curran Associates, Inc., 2021. 1, 2
work page 2021
-
[9]
Unire- store: Unified perceptual and task-oriented image restoration model using diffusion prior
I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu, Yuan-Chun Chiang, Sy-Yen Kuo, and Ming-Hsuan Yang. Unire- store: Unified perceptual and task-oriented image restoration model using diffusion prior. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17969–17979, June 2025. 2
work page 2025
-
[10]
Stable diffusion is a natural cross-modal decoder for layered ai-generated im- age compression
Ruijie Chen, Qi Mao, and Zhengxue Cheng. Stable diffusion is a natural cross-modal decoder for layered ai-generated im- age compression. In2025 Data Compression Conference (DCC), pages 361–361, 2025. 1, 3
work page 2025
-
[11]
Xinyi Chen, Weimin Lei, Wei Zhang, Yanwen Wang, and Mingxin Liu. Ultra-low bitrate predictive portrait video com- pression with diffusion models.Symmetry, 17(6), 2025. 3
work page 2025
-
[12]
Dove: Efficient one- step diffusion model for real-world video super-resolution,
Zheng Chen, Zichen Zou, Kewei Zhang, Xiongfei Su, Xin Yuan, Yong Guo, and Yulun Zhang. Dove: Efficient one- step diffusion model for real-world video super-resolution,
-
[13]
Learned image compression with discretized gaussian mixture likelihoods and attention modules
Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. Learned image compression with discretized gaussian mixture likelihoods and attention modules. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. 1, 2
work page 2020
-
[14]
Gheorghe Comanici et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities, 2025. 8
work page 2025
-
[15]
Duolikun Danier, Fan Zhang, and David Bull. Ldmvfi: Video frame interpolation with latent diffusion models.Pro- ceedings of the AAAI Conference on Artificial Intelligence, 38(2):1472–1480, Mar. 2024. 2
work page 2024
-
[16]
Keyan Ding, Kede Ma, Shiqi Wang, and Eero P. Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 44(5):2567–2581, 2022. 5
work page 2022
-
[17]
Pengli Du, Ying Liu, and Nam Ling. Cgvc-t: Contextual generative video compression with transformers.IEEE Jour- nal on Emerging and Selected Topics in Circuits and Sys- tems, 14(2):209–223, 2024. 3
work page 2024
-
[18]
Selic: Semantic-enhanced learned image compression via high-level textual guidance, 2025
Haisheng Fu, Jie Liang, Zhenman Fang, and Jingning Han. Selic: Semantic-enhanced learned image compression via high-level textual guidance, 2025. 1, 3
work page 2025
-
[19]
Unimic: To- wards universal multi-modality perceptual image compres- sion, 2024
Yixin Gao, Xin Li, Xiaohan Pan, Runsen Feng, Zongyu Guo, Yiting Lu, Yulin Ren, and Zhibo Chen. Unimic: To- wards universal multi-modality perceptual image compres- sion, 2024. 1, 3
work page 2024
-
[20]
Lei Guo, Wei Chen, Yuxuan Sun, Bo Ai, Nikolaos Pap- pas, and Tony Q. S. Quek. Diffusion-driven semantic communication for generative models with bandwidth con- straints.IEEE Transactions on Wireless Communications, 24(8):6490–6503, 2025. 3
work page 2025
-
[21]
Generative latent video compression, 2025
Zongyu Guo, Zhaoyang Jia, Jiahao Li, Xiaoyi Zhang, Bin Li, and Yan Lu. Generative latent video compression, 2025. 3, 5
work page 2025
-
[22]
Ltx-video: Realtime video latent diffusion, 2024
Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, Poriya Panet, Sapir Weiss- buch, Victor Kulikov, Yaki Bitterman, Zeev Melumian, and Ofir Bibi. Ltx-video: Realtime video latent diffusion, 2024. 2, 4, 5
work page 2024
-
[23]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wal- lach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, vol- ume 30. Curran As...
work page 2017
-
[24]
In-context lora for diffusion transformers, 2024
Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Huanzhang Dou, Chen Liang, Yutong Feng, Yu Liu, and Jin- gren Zhou. In-context lora for diffusion transformers, 2024. 2, 4
work page 2024
-
[25]
Towards practical real-time neural video compression
Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, and Yan Lu. Towards practical real-time neural video compression. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 12543–12552, June 2025. 1, 2, 3, 5
work page 2025
-
[26]
Generative latent coding for ultra-low bitrate image com- pression
Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, and Yan Lu. Generative latent coding for ultra-low bitrate image com- pression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 26088–26098, June 2024. 3
work page 2024
-
[27]
C3: High-performance and low-complexity neural compression from a single image or video
Hyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Richard Schwarz, and Emilien Dupont. C3: High-performance and low-complexity neural compression from a single image or video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9347–9358, June 2024. 1, 2
work page 2024
-
[28]
Nvrc: Neural video representation compression
Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, and 9 David Bull. Nvrc: Neural video representation compression. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Pa- quet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 132440– 132462. Curran Associates, Inc., 2024. 1, 2
work page 2024
-
[29]
Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, and Wen- jun Zhang. Misc: Ultra-low bitrate image semantic compres- sion driven by large multimodal model.IEEE Transactions on Image Processing, 34:335–349, 2025. 3
work page 2025
-
[30]
Openhumanvid: A large-scale high-quality dataset for enhancing human-centric video generation
Hui Li, Mingwang Xu, Yun Zhan, Shan Mu, Jiaye Li, Kaihui Cheng, Yuxuan Chen, Tan Chen, Mao Ye, Jingdong Wang, and Siyu Zhu. Openhumanvid: A large-scale high-quality dataset for enhancing human-centric video generation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 7752–7762, June 2025. 5
work page 2025
-
[31]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. In A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 34892–34916. Curran Associates, Inc., 2023. 1, 3, 8
work page 2023
-
[32]
Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022. 5
work page 2022
-
[33]
Uvg dataset: 50/120fps 4k sequences for video codec analysis and development
Alexandre Mercat, Marko Viitanen, and Jarno Vanne. Uvg dataset: 50/120fps 4k sequences for video codec analysis and development. InProceedings of the 11th ACM Multimedia Systems Conference, pages 331–336, 2020. 5
work page 2020
-
[34]
Lmm-driven se- mantic image-text coding for ultra low-bitrate learned image compression
Shimon Murai, Heming Sun, and Jiro Katto. Lmm-driven se- mantic image-text coding for ultra low-bitrate learned image compression. In2024 IEEE International Conference on Vi- sual Communications and Image Processing (VCIP), pages 1–5, 2024. 1, 3
work page 2024
-
[35]
Openvid-1m: A large-scale high-quality dataset for text-to- video generation, 2025
Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhen- heng Yang, Zhijie Chen, Xiang Li, Jian Yang, and Ying Tai. Openvid-1m: A large-scale high-quality dataset for text-to- video generation, 2025. 5
work page 2025
-
[36]
Jens-Rainer Ohm, Gary J Sullivan, Heiko Schwarz, Thiow Keng Tan, and Thomas Wiegand. Comparison of the coding efficiency of video coding standards—including high efficiency video coding (hevc). InIEEE Transactions on Cir- cuits and Systems for Video Technology, volume 22, pages 1669–1684. IEEE, 2012. 5
work page 2012
-
[37]
Lightweight diffusion models for resource-constrained semantic communication, 2024
Giovanni Pignata, Eleonora Grassucci, Giordano Cicchetti, and Danilo Comminiello. Lightweight diffusion models for resource-constrained semantic communication, 2024. 3
work page 2024
-
[38]
Linfeng Qi, Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, and Yan Lu. Generative latent coding for ultra-low bi- trate image and video compression.IEEE Transactions on Circuits and Systems for Video Technology, 35(10):10500– 10515, 2025. 3, 5
work page 2025
-
[39]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of Machine Learn- ing Research, 21(140):1–67, 2020. 4
work page 2020
-
[40]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022. 2
work page 2022
-
[41]
Juan Song, Lijie Yang, and Mingtao Feng. Extremely low-bitrate image compression semantically disentangled by lmms from a human perception perspective, 2025. 3
work page 2025
-
[42]
Converting video formats with ffmpeg
Suramya Tomar. Converting video formats with ffmpeg. Linux J., 2006(146):10, June 2006. 1, 5
work page 2006
-
[43]
To- wards accurate generative models of video: A new metric & challenges, 2019
Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. To- wards accurate generative models of video: A new metric & challenges, 2019. 5
work page 2019
-
[44]
M3-cvc: Control- lable video compression with multimodal generative mod- els
Rui Wan, Qi Zheng, and Yibo Fan. M3-cvc: Control- lable video compression with multimodal generative mod- els. InICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2025. 1, 3
work page 2025
-
[45]
Heng Wang, Ioannis Katsavounidis, Jiantong Zhou, Jonghoon Park, Shawmin Lei, Xin Zhou, Man-On Pun, Xin Jin, Ronggang Wang, Xin Wang, et al. Mcl-jcv: A jnd-based h.264/avc video quality assessment dataset.IEEE Interna- tional Conference on Image Processing (ICIP), pages 1509– 1513, 2016. 5
work page 2016
-
[46]
Xijun Wang, Xin Li, Bingchen Li, and Zhibo Chen. Liftvsr: Lifting image diffusion to video super-resolution via hybrid temporal modeling with only 4×rtx 4090s, 2025. 2
work page 2025
-
[47]
T-gvc: Trajectory-guided gen- erative video coding at ultra-low bitrates, 2025
Zhitao Wang, Hengyu Man, Wenrui Li, Xingtao Wang, Xi- aopeng Fan, and Debin Zhao. T-gvc: Trajectory-guided gen- erative video coding at ultra-low bitrates, 2025. 1, 3, 5
work page 2025
-
[48]
Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jing- wen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Exploring video quality assessment on user gen- erated contents from aesthetic and technical perspectives. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 20144–20154, October
-
[49]
Effec- tive whole-body pose estimation with two-stages distillation
Zhendong Yang, Ailing Zeng, Chun Yuan, and Yu Li. Effec- tive whole-body pose estimation with two-stages distillation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 4210–4220, October 2023. 2, 3
work page 2023
-
[50]
Conditional video generation for high-efficiency video compression, 2025
Fangqiu Yi, Jingyu Xu, Jiawei Shao, Chi Zhang, and Xue- long Li. Conditional video generation for high-efficiency video compression, 2025. 1, 3, 4, 5
work page 2025
-
[51]
Generative video semantic communication via multimodal semantic fusion with large model, 2025
Hang Yin, Li Qiao, Yu Ma, Shuo Sun, Kan Li, Zhen Gao, and Dusit Niyato. Generative video semantic communication via multimodal semantic fusion with large model, 2025. 3
work page 2025
-
[52]
Resshift: Efficient diffusion model for image super- resolution by residual shifting
Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super- resolution by residual shifting. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, vol- ume 36, pages 13294–13307. Curran Associates, Inc., 2023. 2
work page 2023
-
[53]
Maojun Zhang, Haotian Wu, Guangxu Zhu, Richeng Jin, Xi- aoming Chen, and Deniz G ¨und¨uz. Semantics-guided diffu- sion for deep joint source-channel coding in wireless image transmission, 2025. 1, 3
work page 2025
-
[54]
When video coding meets multimodal large language models: A unified paradigm for video coding, 2025
Pingping Zhang, Jinlong Li, Kecheng Chen, Meng Wang, 10 Long Xu, Haoliang Li, Nicu Sebe, Sam Kwong, and Shiqi Wang. When video coding meets multimodal large language models: A unified paradigm for video coding, 2025. 4
work page 2025
-
[55]
Efros, Eli Shecht- man, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), June 2018. 5
work page 2018
-
[56]
Yuhong Zhang, Hengsheng Zhang, Zhengxue Cheng, Rong Xie, Li Song, and Wenjun Zhang. Ssp-ir: Semantic and structure priors for diffusion-based realistic image restora- tion.IEEE Transactions on Circuits and Systems for Video Technology, 35(7):6259–6272, 2025. 2
work page 2025
-
[57]
Eden: Enhanced diffusion for high-quality large-motion video frame interpo- lation
Zihao Zhang, Haoran Chen, Haoyu Zhao, Guansong Lu, Yanwei Fu, Hang Xu, and Zuxuan Wu. Eden: Enhanced diffusion for high-quality large-motion video frame interpo- lation. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 2105– 2115, June 2025. 2 11 Low-Bitrate Video Compression through Semantic-Conditioned ...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.