PhyAVBench provides the first systematic benchmark and metric for audio-physics grounding in T2AV, I2AV, and V2A models using controlled prompt pairs and real video ground truth.
Audio set: An ontology and human-labeled dataset for audio events,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
A timing-aware pre-quantization fusion approach integrates visual cues into audio tokenizers along the temporal axis, maintaining reconstruction quality while outperforming audio-only and prior multimodal baselines on downstream tasks.
citing papers explorer
-
PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation
PhyAVBench provides the first systematic benchmark and metric for audio-physics grounding in T2AV, I2AV, and V2A models using controlled prompt pairs and real video ground truth.
-
Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization
A timing-aware pre-quantization fusion approach integrates visual cues into audio tokenizers along the temporal axis, maintaining reconstruction quality while outperforming audio-only and prior multimodal baselines on downstream tasks.