OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.
Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a restoration-oriented sampling strategy.
A quantization technique for diffusion models that aligns sampling trajectories to preserve high-order sampler performance under quantization noise.
FREE-Switch dynamically switches LoRA adapters using frequency importance per diffusion step and adds semantic alignment to reduce content drift when merging specialized image generators.
ProGIC applies residual vector quantization with a lightweight CNN-attention backbone to deliver progressive generative image compression with claimed perceptual gains and over 10x faster encoding/decoding versus MS-ILLM.
citing papers explorer
-
OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text
OmniSonic introduces a TriAttn-DiT architecture with MoE gating to jointly generate on-screen, off-screen, and speech audio from video and text, outperforming prior models on a new UniHAGen-Bench.
-
VOSR: A Vision-Only Generative Model for Image Super-Resolution
VOSR shows that competitive generative image super-resolution with faithful structures can be achieved by training a diffusion-style model from scratch on visual data alone, using a vision encoder for guidance and a restoration-oriented sampling strategy.
-
Sampling-Aware Quantization for Diffusion Models
A quantization technique for diffusion models that aligns sampling trajectories to preserve high-order sampler performance under quantization noise.
-
FREE-Switch: Frequency-based Dynamic LoRA Switch for Style Transfer
FREE-Switch dynamically switches LoRA adapters using frequency importance per diffusion step and adds semantic alignment to reduce content drift when merging specialized image generators.
-
ProGIC: Progressive and Lightweight Generative Image Compression with Residual Vector Quantization
ProGIC applies residual vector quantization with a lightweight CNN-attention backbone to deliver progressive generative image compression with claimed perceptual gains and over 10x faster encoding/decoding versus MS-ILLM.