Vidtok: A versatile and open-source video tokenizer

Tang, A · 2024 · arXiv 2412.13061

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

AVTok is a unified tokenizer that converts audio-video pairs into a compact 1D latent representation via dual-stream transformer and hierarchical training for improved reconstruction and cross-modal generation.

Latent Wavelet Diffusion For Ultra-High-Resolution Image Synthesis

cs.CV · 2025-05-31 · unverdicted · novelty 6.0

Latent Wavelet Diffusion uses wavelet energy map masking and a scale-consistent VAE to improve detail fidelity in 2K-4K image generation without extra inference overhead.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Latent Wavelet Diffusion For Ultra-High-Resolution Image Synthesis cs.CV · 2025-05-31 · unverdicted · none · ref 59
Latent Wavelet Diffusion uses wavelet energy map masking and a scale-consistent VAE to improve detail fidelity in 2K-4K image generation without extra inference overhead.

Vidtok: A versatile and open-source video tokenizer

fields

years

verdicts

representative citing papers

citing papers explorer