InstructAV2AV is an end-to-end instruction-guided audio-video joint editing model that adapts a pre-trained backbone with gated attention and two-stage training, outperforming prior methods on 11 metrics after building the InsAVE-80K dataset.
Peavs: Perceptual evaluation of audio-visual synchrony grounded in viewers’ opinion scores,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
PhyAVBench provides the first systematic benchmark and metric for audio-physics grounding in T2AV, I2AV, and V2A models using controlled prompt pairs and real video ground truth.
citing papers explorer
-
InstructAV2AV: Instruction-Guided Audio-Video Joint Editing
InstructAV2AV is an end-to-end instruction-guided audio-video joint editing model that adapts a pre-trained backbone with gated attention and two-stage training, outperforming prior methods on 11 metrics after building the InsAVE-80K dataset.
-
PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation
PhyAVBench provides the first systematic benchmark and metric for audio-physics grounding in T2AV, I2AV, and V2A models using controlled prompt pairs and real video ground truth.