MSAVBench is the first comprehensive benchmark for multi-shot audio-video generation, spanning video, audio, shot, and reference dimensions with an adaptive evaluation framework that reaches 91.5% Spearman correlation with human judgments.
FireRedASR2S: A state-of-the-art industrial-grade all-in-one automatic speech recognition system
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2roles
dataset 1polarities
use dataset 1representative citing papers
Dolphin-CN-Dialect is a compact ASR model that boosts Chinese dialect accuracy through balanced sampling of rare dialects and character-level tokenization while staying smaller than recent open-source competitors.
citing papers explorer
-
MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation
MSAVBench is the first comprehensive benchmark for multi-shot audio-video generation, spanning video, audio, shot, and reference dimensions with an adaptive evaluation framework that reaches 91.5% Spearman correlation with human judgments.
-
Dolphin-CN-Dialect: Where Chinese Dialects Matter
Dolphin-CN-Dialect is a compact ASR model that boosts Chinese dialect accuracy through balanced sampling of rare dialects and character-level tokenization while staying smaller than recent open-source competitors.