InstructAV2AV is an end-to-end instruction-guided audio-video joint editing model that adapts a pre-trained backbone with gated attention and two-stage training, outperforming prior methods on 11 metrics after building the InsAVE-80K dataset.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
HDR video generation is achieved by logarithmically encoding HDR imagery to align with pretrained generative model latents, enabling minimal fine-tuning and degradation-based inference of missing content.
citing papers explorer
-
InstructAV2AV: Instruction-Guided Audio-Video Joint Editing
InstructAV2AV is an end-to-end instruction-guided audio-video joint editing model that adapts a pre-trained backbone with gated attention and two-stage training, outperforming prior methods on 11 metrics after building the InsAVE-80K dataset.
-
HDR Video Generation via Latent Alignment with Logarithmic Encoding
HDR video generation is achieved by logarithmically encoding HDR imagery to align with pretrained generative model latents, enabling minimal fine-tuning and degradation-based inference of missing content.