Od-vae: An omni-dimensional video compressor for improving latent video diffusion model

Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinghua Cheng, Li Yuan · 2024 · arXiv 2409.01199

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

NeuroQuant is a modality-aware 3D VQ-VAE that uses dual-stream encoding, a shared anatomical codebook, and FiLM to achieve superior multi-modal brain MRI reconstruction.

ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation

cs.CV · 2026-03-18 · unverdicted · novelty 7.0

ChopGrad truncates backpropagation to local frame windows in video diffusion models, reducing memory from linear in frame count to constant while enabling pixel-wise loss fine-tuning.

Task-Oriented Communication for Human Action Understanding via Edge-Cloud Co-Inference

eess.SP · 2026-05-08 · unverdicted · novelty 5.0

TOAU compresses human motion videos to 9 bits per frame with pose estimation and VQ-VAE, then aligns the tokens to a vision-language model via a lightweight projector, achieving 1% transmission payload and 20% latency of video codecs while maintaining comparable action understanding accuracy.

Video Generation with Predictive Latents

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

PV-VAE improves video latent spaces for generation by unifying reconstruction with future-frame prediction, reporting 52% faster convergence and 34.42 FVD gain over Wan2.2 VAE on UCF101.

HunyuanVideo: A Systematic Framework For Large Video Generative Models

cs.CV · 2024-12-03 · unverdicted · novelty 5.0

HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models including Runway Gen-3 and Luma 1.6.

citing papers explorer

Showing 6 of 6 citing papers.

Efficient Video Diffusion Models: Advancements and Challenges cs.CV · 2026-04-17 · unverdicted · none · ref 235
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI cs.CV · 2026-04-06 · unverdicted · none · ref 5
NeuroQuant is a modality-aware 3D VQ-VAE that uses dual-stream encoding, a shared anatomical codebook, and FiLM to achieve superior multi-modal brain MRI reconstruction.
ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation cs.CV · 2026-03-18 · unverdicted · none · ref 9
ChopGrad truncates backpropagation to local frame windows in video diffusion models, reducing memory from linear in frame count to constant while enabling pixel-wise loss fine-tuning.
Task-Oriented Communication for Human Action Understanding via Edge-Cloud Co-Inference eess.SP · 2026-05-08 · unverdicted · none · ref 31
TOAU compresses human motion videos to 9 bits per frame with pose estimation and VQ-VAE, then aligns the tokens to a vision-language model via a lightweight projector, achieving 1% transmission payload and 20% latency of video codecs while maintaining comparable action understanding accuracy.
Video Generation with Predictive Latents cs.CV · 2026-05-04 · unverdicted · none · ref 10
PV-VAE improves video latent spaces for generation by unifying reconstruction with future-frame prediction, reporting 52% faster convergence and 34.42 FVD gain over Wan2.2 VAE on UCF101.
HunyuanVideo: A Systematic Framework For Large Video Generative Models cs.CV · 2024-12-03 · unverdicted · none · ref 11
HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models including Runway Gen-3 and Luma 1.6.

Od-vae: An omni-dimensional video compressor for improving latent video diffusion model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer